I’m more convinced than ever that using fixed-point math for color transformations is the way to go. That said, there is one alternative that is attractive for several reasons and I discuss it below. I think we should think about doing one or the other in the 0.11 cycle. Let’s start with fixed-point.
We got a patch for this a while ago (see Shotwell ticket #2356] from an old friend of Adam’s who’s clearly a competent graphics hacker. Jim did some performance testing with the patch, and found that in some cases the patched code was 50 - 100 times faster than the code we’re using now. Now, at first this didn’t make sense to me (every x86 CPU since the Pentium has had an FPU on the die) so floating point today shouldn’t be the performance killer it was in the 80’s and early 90’s. Indeed, almost all 3D graphics and games exclusively use single-precision floating point internally for all calculations, and they’re pretty fast, right?
Most of our performance woes are the result of format conversion between integer and floating point number representations. All of our pixbufs are managed by GDK, so their pixel representations are 32-bit integers, with 8 bits per color component. But to do our floating-point math internally, we have to convert every component of every pixel from an 8-bit integer to a single-precision float. What’s more, we actually do this twice (once to convert from integer to floating point and then, when we draw transformed image, from floating point back to integer. Michael Herf has a great article on the speed of x86 format conversions) or lack thereof here. Especially worth reading is this section where Herf actually suggests using custom algorithms to do format conversion instead of using the compiler. I was aware that x86 numeric conversions were bad but I had no idea how bad. To quote Herf:
- To implement a "correct" conversion, compiler writers have had to switch the rounding mode, do the conversion with fistp, and switch back. Each switch requires a complete flush of the floating point state, and takes about 25 cycles.
This means that this function, even inlined, takes upwards of 80 (EIGHTY) cycles!
Now we actually have code today in ColorTransformation.vala that tries to avoid one of the two format conversions when the user is dragging the slider interactively. See
public void transform_from_fp (ref float fp_pixel_cache, Gdk.Pixbuf dest)
at line 817. But that only goes so far.
Now, about those games. I’ll bet you’re asking "but wait, games are all floating point and they’re really fast; what’s up?" Games are fast because they draw using !OpenGL or Direct3D, both of which have native support for images whose pixels have floating point values. For example, see this page in the !OpenGL reference manual. So if you’re drawing with Direct3D or !OpenGL, everything is floating-point. Integer conversion happens in secret inside the GPU, using custom hardware, just before the pixels are actually drawn. It’s also massively parallel inside the GPU, since the conversion of one pixel doesn’t depend at all on the values of the pixels around it. Suffice to say, this is way faster than doing it on the CPU, serially, one pixel after another.
So I propose that in the Shotwell 0.11 cycle, we do one of the following:
- Continue to use GDK for drawing and make all color transformations fixed point. This means no more format conversions.
- Switch to using !OpenGL for drawing and make everything floating point. !OpenGL will draw buffers of floating-point values directly, just as if they were pixels, using the full capabilities of the GPU.
Now, option 1 seems to be a no-brainer — we already have a patch for it. But taking the patch would require versioning the color transformations.
What makes option 2 attractive are the following:
- If we do option 2, we don’t have to use !OpenGL to do programmable graphics — we just use !OpenGL to draw pixels. And that’s pretty low-risk.
- No need to version color transformations.
- Transitioning to a 16-bit-per-component pipeline for handing 16-bit-per-component RAW and TIFF images requires almost no work — since everything is 32-bit floating point internally, we don’t care what the image’s on-disk pixel format is.
Import Subsystem Refactoring
[lucas & eric]
The Shotwell import subsystem has grown to become unmangeable. Specifically, there is very little unification and code re-use among the three possible import avenues: cameras, files, and alien databases. We propose making the import subsystem the first major target for our incremental refactoring of Shotwell’s internals.
Use image-space coordinates everywhere in the app
Currently, when applying various bits of pipeline to an image, we have to check whether a coordinate pair is against the raw image coordinates, the rotated image coordinates, the rotated-and-cropped-but-not-yet-scaled image coordinates, the rotated-scaled-cropped coordinates, etc. and constantly jump/convert between them; straightening has only made this more complex.
It would be best if Shotwell remembered everythingin image-space coordinates, and offered something similar to gluProject () to convert screen/mouse coordinates to image-space and vice versa; this would enormously simplify the tasks of keeping positional tags, highlight regions, red-eye regions, crop box corners and other coordinate-sensitive things in sync whilst rotating an image to an arbitrary angle (or, for that matter, any transformation that can be described with a matrix, like warping or shearing, if we decided we wanted that in the future).