How I put an NFT collection of 20k animated GIFs on-chain and saved 9 Eth on gas

xtremetom

Published in

Coinmonks

14 min readJul 31, 2023

Baby Pepes move on-chain

The first thing that comes to mind when reading the title is, why bother?

The honest answer is that it seemed like a fun challenge and it was. In the end, I came up with a unique data storage method that saved me more than 9 Eth in transaction costs by reducing the total data by ~90%.

Before we jump into the how, let's explore what Baby Pepes are and why it was a challenge to put the whole collection on-chain at a reasonable cost.

Also, fair warning, things are gonna get pretty geeky.

What are Baby Pepes?

Baby Pepes are a generative NFT collection of 20k animated pixel art pieces, inspired by the infamous Pepe created by Matt Furie. We wanted to build a derivative to celebrate the randomness of Pepe meme culture and inject a little fun back into NFTs. You can view the collection on Opensea:

🐸 https://opensea.io/collection/babypepes

Why is it a challenge to put them on-chain?

Each token is constructed of four animated layers:

Background (yep, even the background)
Body
Hat
Face

Each layer has a trait and for each trait, there are multiple possible GIFs that could be used; the background layer has 6 possible animated traits, the body layer has 93, the hat layer has 99 and the face layer has 67. Making a total of 235 GIFs.

For Baby Pepes to be fully on-chain and work with zero centralized dependencies, I would need to store all those GIFs on-chain.

So, just do that, right?

Sadly, storing data on-chain is really expensive, like stupid expensive.

Every day NFT flippers and collectors are complaining about gas prices and they, for the most part, are initiating cheap transfers. We would be writing large blocks of data to the blockchain.

To give you some idea of just how much data, below is what the above orange animated background looks like in data form (base64 encoded):

And that is just one of 6 backgrounds. We would also need to write all the other animated layers to the blockchain.

In that form, all that data writing would easily cost more than 10 Eth and that’s if gas is 15–17 gwei.

The next problem we face is storing token trait information for all 20k pieces. Each Baby Pepe has four traits that are described with a trait id. Think of it as the genetics describing what each Baby Pepe looks like:

It looks simple enough, get the trait ids and string them together to form the longer number. And honestly, it is that simple. The problem arises when you have 20k of them and all that data needs to be stored on-chain.

Finally, all the data needs to be structured in a way to help reduce the size and cost.

To recap, the issues are:

Writing data on-chain is expensive
GIFs are large data structures
The gene pool is a lot of data
Finalizing data, optimizing and structuring for retrieval

Let's explore solving those issues

Problem 1: Writing data on-chain is expensive

There is nothing I can do to change the cost of writing data to the blockchain short of joining the Ethereum team and somehow creating a cheaper method. And that aint gunna happen cos I suck at Go, Im not that good at Solidity and Vitalik is an uber GOAT.

That leaves me with two options:

Use a cheaper method of writing data on-chain
Write as little as possible to save on costs

Finding a Cheap way to write data on-chain

Writing directly to storage for the amount of data I need to write is not only prohibitively expensive, but it also adds an additional layer of complexity in terms of utilizing it. Luckily SSTORE2 can be utilized to efficiently read and write data as byte code contracts, which solves half my problem as explained with the following nerdery;

Reading with SSTORE2

source: https://github.com/0xsequence/sstore2

Reading data is a lot cheaper compared to native SLOAD operations (native solidity storage).

Writing with SSTORE2

source: https://github.com/0xsequence/sstore2

Writing data is also cheaper than native SSTORE operations (native solidity storage), but gains become apparent after higher data sizes.

So SSTORE2 seems like a great fit. However, I would ideally like to be able to read my data bundle as a single block of data and not have to pick at byte contracts — cos I'm lazy.

How to read and use my data bundles

For large data sets like mine, I will have to break the data into 24kb chunks and write each separate chunk as a byte contract, and when I want to use the data I need to pull all the data from all the chunks and concatenate it back into a single data set.

That all sounds horribly complicated. However, whilst I was experimenting with on-chain storage for the creation of CryptoCoasters I was inspired by dhof’s rose:

It helped me to fully understand how to utilize SSTORE2 for writing and retrieving large amounts of data on the blockchain. After a lot of experimenting, I came up with this code to concatenate data chunks into their original form:

That code has since been used in EthFS (source: https://github.com/holic/ethfs/blob/main/packages/contracts/src/File.sol) and Scripty of which I am a co-author. Both libraries aid in the writing and reading of large data sets.

For this project, I opted to use Scripty storage. Mainly because we just finished creating version 2 and I wanted to demonstrate just one of its many uses:

GitHub - intartnft/scripty.sol: A gas-efficient HTML builder that can combine multiple JS and a…

A gas-efficient HTML builder that can combine multiple JS and a data storage solution that allows on-chain composable…

github.com

Ok, so all that geeky stuff solves my issues with how to write data and utilize it on-chain. Thats great! Now to figure out how to reduce the amount of data I needed to write.

Problem 2: Handling GIFs on-chain

The section header is a little misleading, but it does illustrate what I was thinking I would have to do. In reality, I decided that GIFs were just too cumbersome to put on-chain for this particular project. I needed to find a new way to build my animated tokens.

Bastardizing GIFs

HAHA, ok this title is more on the money.

I have an extensive background in building and optimizing websites for performance and SEO, as well as a background in game development. Both of which make use of sprite sheets.

In gaming, character sprite sheets are used to render animations by displaying each separate frame in sequence. The sprite sheet simply houses the images. GIFs hold the same information but also a lot of additional data like the duration of each frame and a bunch of other stuff we don’t really care about.

This means if I could find a way to use sprite sheets instead of GIFs I could reduce the amount of data I have to store — that’s a huge win. The only problem was figuring out a nice method.

At this point, my goals were:

Find a way to display the final tokens that could be displayed via the “image” attribute in the metadata. This is crucial because I want the token to be animated in the Opensea previews as well as the larger views.
Use sprite sheets
Reduce data to be stored

Displaying as an image

After a lot of playing around, I converted a few traits into sprite sheets and threw them into SVGs. You can see an early example here (https://codepen.io/xtremetom/pen/MWzZjWj):

Looks blurry on some devices and mobile :(

That early example doesn't work on a lot of browsers or mobile. It took a lot of trial and error to perfect the structure for cross-browser and device displays, which resulted in this (https://codepen.io/xtremetom/pen/poQqEvy):

Clean on modern browsers and mobile

The way it works is pretty simple. The token is made of four sprite sheets all layered on top of each other like this on the z-axis:

example of a sprite sheet

The viewer can only see a small area (red) and everything outside of that viewing area is hidden:

Each frame is presented for a fixed period of time and then the whole sprite sheet is shifted to the left so the next frame of each layer comes into view.

Once the cycle is complete, after 24 frames over 2.4 seconds, the sprite sheets return to their starting position and the cycle repeats. This creates the animation loop cycle.

This was a great step in the right direction, but I felt the sprite sheets, although they were considerably smaller in data size, were still too big as you can see in this comparison between the GIF and the sprite sheet.

Body Layer: Police Trait as Sprite Sheet

After a little playing around I was able to find a nice way to optimize the sprite sheets using https://pngquant.org/ and this was the final result:

Body Layer: Police Trait as Sprite Sheet — optimized

For those that aren't keeping track, that's a 10000000% reduction in data size — almost, it is actually ~98%, which is insane :P

With that adjustment, I created a simple Python script to iterate over all the GIFs and convert them into optimized sprite sheets.

Problem 3: The gene pool is a lot of data

Sadly, there is not much I can do to reduce the size of this data structure. Each Baby Pepe has four traits and I simply need to store that information for 20k tokens.

If each genome is 4 bytes long, that means I can store four numbers up to 255. I know the highest trait Id I currently have is 98, so 4 bytes is perfect and leaves potential room for growth.

All I have to do is cram all the genomes together into one very long data set and boom, we have the final gene pool. Additionally, because we know each genome is 4 bytes long it's very easy to extract data for a specific token Id. We simply slice the gene pool using the (token Id -1) x 4 as the offset.

Perfect. At this point, I solved my first three issues and had a way to display the animations and methods for reducing the size of the image data. Next, I had to figure out what other data was needed and how I should structure it and minimize it.

Problem 4: Finalizing data, optimizing and structuring for retrieval

Ok, time to get geeky again — time to introduce Professor Pepe:

Professor Pepe, is here to help with his Gigabrain

Some of you may already see a big issue with my approach.

How exactly are you storing the sprite sheets on-chain in a way that allows you to retrieve a specific trait?

For anyone wondering why this is an issue, let's break down the problem.

Imagine I am trying to save two separate sprite sheets on-chain. We already know we aren't saving the actual image as you might when you drag and drop something with an upload system on a website. Instead, we have to manage all the data ourselves. For the sake of this explanation we are going base64 encode the sprite sheets. The result for the two images will look something like this:

It doesn't look so bad when separated like this, but I will be packing the data together and adding on a lot more data, and that ends up looking like this:

It starts to become very hard to distinguish image data and that is the problem. In order to use the image data I need to be able to extract data for a single image from a dataset of roughly 300 images.

Structure data for slicing

Slicing specific data from the huge data set is the obvious solution. However, I don't want to handle the massive data set for every trait for every layer. That would waste gas and I need to be mindful of gas consumption. The solution I landed on was to group trait data into layers. That way I could slice all the body data into one variable and handle that to extract the required body sprite sheet. Structurally it looks like this:

Ok, that all looks great, but I bet some of you are wondering how I actually pinpoint the data I need and extract it.

I know the length of data representing each sprite sheet. I can use that to calculate and store the index of the last byte of data for each. As long as I store that information in the same order as the trait Ids, I don't have to store anything else and I can use that to slice up a layer of data to extract any trait data I need.

Wow, that's a mouth full and certainly hard to visualize, so hopefully this example helps:

The same can also be done when packing the layer bundles into the final collection bundle:

Now that I have the data extraction method sorted, I just need to make sure there is no more data I need to pack in. I probably should have thought of this first but I pretty much made all this up as I went.

Adding in Trait Names

Turns out, I also need to pack in the trait names as the metadata for Baby Pepes includes the names for the traits. I could store that data in its own data section at the end of my big trait data set, but then I would also have to store more data to handle that data.

That just feels too messy and inefficient.

Instead, I opted to create a Trait Bundle. The Trait Bundles consist of the sprite sheet data, the name of the trait, and finally the length of the name. It looks like this:

Example of trait bundle structure and data

This structure allows me to store minimal information and still easily unpack the data with these steps:

Extract this Trait Bundle from the Layer Bundle as previously explained
Extract the count
Use the count to extract the name
Extract the sprite sheet data

Perfect, well nearly. There is still excessive data being stored due to the data format. We can optimize this data by at least 33%.

Optimizing Bundles

Anyone familiar with sending data over sockets and Redis probably already knows the key optimization Im about to share.

I can save about 33% in data size by storing all the data as bytes. By using base64 encoded sprite sheets Im adding roughly 33% to the data size — not good!

So, we keep the images as bytes. In fact, we store everything as bytes.

The count becomes a single byte, easily capable of storing the length of any trait name up to 255 characters long. The name can be up to 255 bytes long. The image data is as long as it needs to be.

But wait we can still do better and we are about to get into a whole new level of geekiness.

PNG Structure Manipulation

PNG images, like the ones I'm converting the animated GIFs into, have a defined structure. They have to, otherwise, the internet would fall apart and all your fav memes would catch fire.

If you can be bothered to take a look, there is a nice Wikipedia page that describes the PNG structure among other things:

PNG - Wikipedia

Portable Network Graphics ( PNG, officially pronounced PING , colloquially pronounced PEE-en-JEE ) is a raster-graphics…

en.wikipedia.org

But for the lazy, let me explain how I used this structure to save on data storage.

Simple PNGs like the ones I'm handling have the same basic structure. They all share the same PNG signature and most of the data in the image header. This repetition of data means I can trim off the first 24 bytes from every image. I store those 24 bytes directly in the contract and every time I use a sprite sheet, I simply bolt it back on:

Bolting back on the removed 24 bytes of data during sprite sheet use

That little trick saved me an additional 6,360 bytes — don't laugh, it all adds up :)

Next, I have to figure out how to handle the custom data of 1/1s.

Adding 1 of 1s to the data bundle

We can't move all the Baby Pepes on-chain and forget about our awesome 1/1s, like Pepechu and Sexy Pepe:

However, these tokens pose a new problem. They aren't made up of multiple layers and traits. Each 1/1 is a single GIF and has no trait ids to build from.

Luckily the solution was pretty straightforward. I created a data bundle, especially for the 1/1s. Think of them as having a single trait, that's exactly how I treat them. All the 1/1s bundles create their own special layer bundle. I simply use it as per all the other bundles.

Quick and easy now that we have a data handling model all planned out.

Awesome, that was the final issue that needed solving. We now have a data model and plans for retrieving any data we need — it's Solidity time!

The Solidity Contracts

By now, I'm sure most of you that are interested in on-chain NFT contracts have studied enough to know how they generally work, so I'm not going to pad this already long article out any anymore. Plus my code is heavily commented.

OnchainRenderer.sol:
https://www.contractreader.io/contract/mainnet/0xc4fca7eb2087568829e4b493adda247a11f0c966

BundleManager.sol:
https://www.contractreader.io/contract/mainnet/0x75d71b583de37f2be7b271a79954055656620e7a

However, we do we still need to get the data on-chain.

To achieve this, I created a ton of Python scripts to build the bundle structure as described above. To make life easier for me, I made sure to save all the final data in hex format. I wanted to retain some readability for testing and playing with the data.

The final outcome was two files

gene_pool.txt (80,000 bytes)
collection_bundle.txt (201,423 bytes)

Next, I had to create a nice way to get all the data bundles on-chain. Luckily 0xthedude and I had just completed Scripty V2, which comes with a nice facility to store massive data bundles on-chain.

I jumped into Hardhat (I know, I know, Foundry FTW) and threw together this simple script to chop up my final files and store the data via the Scripty storage script:

Just like that Baby Pepes move on-chain — easy O.o
The final cost for all the transactions was ~1.075 Eth

You can find the collection on Opensea:
https://opensea.io/collection/babypepes

Join us on Twitter (X):
https://twitter.com/baby_pepes

Check out our website:
https://www.babypepes.com/