Posted by Danny TarlowI'm working on a project where I need to load a lot of images into a C++ program, and it's taking up an annoyingly long amount of time relative to the rest of the program (the loading of data takes more time than running my algorithms), so I put in a couple hours to see if I could optimize it.
The basic setup is that for each example and iteration, I need to load around 100 images and iterate over all pixels. The images are each of size around 150x200 pixels. A typical full run of the full algorithm does around twenty iterations on a couple hundred examples (say 200). To produce the results I need, it will take 20 or 30 full runs. I can parallelize a lot, but I figured it was worth taking a pass optimizing my input/output code a bit first.
My initial implementation used the CImg library to load the images. It is a simple loop, iterating over filenames, loading the images, then iterating over the pixels to construct the model. For a single example, it takes about 6.7 seconds: 2.2 seconds to load the 100 images, and 4.5 seconds to iterate over them and construct the model. For a full run, that amounts to roughly 20 * 200 * 6.7 = 26800 seconds, or 7.4 hours. In reality, I usually split the work over 4 cores, so I can get results in ~2 hours.
The new version I am playing around with uses Google Protocol Buffers instead. Instead of loading each of the images separately, I write all of the pixel values of the 100 images into a protocol buffer, then load the single file instead of the 100 separate ones. For a single example, this cuts the time down to about 2.3 seconds: .3 seconds to load the data, and 2.0 seconds to iterate over the values and construct the model. It's not earth shattering, but it's still a nice speedup, cutting the input/output component of time for a full run down to about 9200 seconds, or 2.6 hours (2.6/7.4 = 35%).