I think it's simple.
When color A does 1/3 of revolution and goes exactly on the corresponding pixel of color B, we have the two color one over the other (let's suppose a prefect timing and position, now I don't remember whether spokepov uses an external oscillator or not).
The problem is that color A will be at its max lightness, since it is really in that position, while pixel of color B will be much dimmer since it was in that position a while before...
This is due to the fading of chemicals in the eye with time.
Try to spin the wheel at mad speeds and it will work
Multi color images should be done with at least 3 spokes per color (just a guess) or, even better, using multi color led on the same spoke. Probably not difficult, it's a matter of adding ICs.