This is really an odd SPI chip (unusual is probably a better word). While I saw the code in the various libraries, I didn't believe that you actually address the chain backwards of normal SPI operation. While with most SPI based, daisy-chained devices, if you have a string of X items, you send the data for X out first, then X-1, X-2, X-3, etc until the last thing you send is for the first device on the chain.
I'm finding that these chips must have a "smart" mode where they absorb the first 3 color bytes and do not echo them. When the next set of bytes come out, then are passed along and the next unit does the same. etc, etc. The result is that the first set of bytes truly controls the first LED.
While this is already obvious to the fine folks who have created the libraries out there, it is well, different. In fact, as much as I didn't like the HL1606, it did act like a normal SPI device (in that the first data sent was "pushed" along til the end, meaning you sent the last LEDs data out first, then the second to last, etc, etc).
Also, monitoring the datalines, I am seeing some sort of "automatic" activity on the lines after the first chip received the 3-zeros latch command. So again, this chip is smarter than most SPI devices and is taking a single command and replicating it without the host helping, all the way down the line.
Hope this helps the next person building something different to control them

Gerry