Re: FFT or FHT with a single neopixel strip?

by adafruit_support_mike on Tue Jun 15, 2021 6:58 am

Trying to use FFTs without knowing at least some of the theory is an exercise in pain.

The first core idea of Fourier transforms is that any repeating pattern can be broken down into the sum of multiple (many) sine waves with different amplitudes and frequencies. A Discrete Fourier Transform (DFT) uses multiples of the same base frequency, and its output is the amplitudes of sin(x), sin(2x), sin(3x), and so on. The term 'transform' means converting a function of one variable (amplitude at a specific time) to a function of another variable (amplitude at a specific frequency).

The trouble with DFTs is the gap between the frequencies x, 2x, 3x, etc. The fact that you can build any other signal doesn't mean it's easy or compact.

The solution is to use a smaller base frequency so the gaps are smaller. 999Hz is much easier to represent as the 999th multiple (aka: 'harmonic') of 1Hz than some combination of harmonics of 1kHz (or 1MHz).

The logical extension of that is to make the base frequency infinitely small so every frequency is some multiple of it. That's the Continuous Fourier transform (or just 'Fourier transform').

The second core idea of Fourier transforms deals with the arithmetic of sine waves: you always get some new combination of sine waves. More interestingly, the product of two sine waves at different frequencies always equals the sum of two sine waves at a different pair of frequencies: sin(a)sin(b)=cos(a-b)-cos(a+b)/2

We can play all sorts of interesting games with that, but Fourier transforms focus on a specific property of sine waves: if you watch them long enough, they spend the same amount of time positive as they spend being negative. The average value of any sine wave over time is zero, and the average of any sum of different sine waves is zero.

There's one exception though: multiplying any sine wave by itself. The square of any number is always positive, so sin(a)^2=(1-cos(2a))/2, and its average value is 1/2.

So if you multiply sin(A) by sin(x) and take the average over time, the value will be zero everywhere except x=A. That's how Fourier transforms extract the amplitude at a given frequency.

And since any repeating waveform can be broken down into the sum of multiple sine waves, the product of the waveform and sin(x) will only have nonzero values for frequencies that are part of the waveform. Better yet, the average value is proportional to the amplitude of n*sin(x)*sin(x), so not only do you get the frequency, you get the amplitude of that component.

That has enormous theoretical value, but actually calculating the continuous Fourier transform involves a huge amount of math.. in general, doing the calculations for N samples took about N^2 operations, which gets painful quickly.. a set of 1000 points would take roughly a million calculations. Carl Friedrich Gauss had found special cases where it could be done more easily as early as 1805, but there was no good general solution until 1965 when James Cooley and John Tukey found a trick that allowed them to reduce the number of calculations to N(logN). That put the problem into the realm of practical computing.. a 1000 point data set only needs about 6000 calculations instead of a million.

The Hartley transform is basically the Fouier transform seen from a different angle.

Back in the 1700s, Leonhard Euler decided to play with the concept of a square root of -1. The philosophical and physical interpretations of it don't matter much if you just think of it as a game where i times i equals -1. If you follow that reasoning, i^0=1, i^1=i, i^2=-1, i^3=-i, and i^4=1. Then the cycle repeats itself over and over.

Euler made the mental leap of comparing the powers of i to positions on a circle. If 0 is sin(0), i is sin(90), -1 is sin(180), and -i is sin(270), you get the same going-around-and-around behavior. If we apply that idea to a pair of values (a+bi), repeated multiplication by i gives us (a+bi), (-b+ai), (-a-bi), (b-ai), (a+bi).. which graph out as the point (a,b) rotated 90 degrees around a circle over and over. If we force a and b to live on a unit circle, the same point becomes (cos(T),isin(T)) for some angle T, and multiplying any other (c+id) point by (cos(T)+isin(T)) rotates the point T degrees around the origin.

That reduces all of trigonometry to simple algebra. Adding angles is equivalent to multiplying (a+ib) terms, so:

(cos(x) + isin(x)) x (cos(y) + isin(y)) = cos(x)cos(y) - sin(x)sin(y) + i(cos(x)sin(y) + sin(x)cos(y)) = cos(x+y) + isin(x+y)

the trig identities for adding angles drop out almost trivially. We can also do a sine wave multiplied by itself:

e^ix * e^ix = e^2ix = cos(2x) + isin(2x) = ( cos(x) + isin(x) )^2

cos(2x) + isin(2x) = cos(x)^2 - sin(x)^2 + i2cos(x)sin(x)

The identities above tell us 2sin(x)cos(x) = sin(2x), and basic trig tells us sin(x)^2 = 1-cos(x)^2, so:

cos(2x) + isin(2x) = cos(x)^2 - ( 1 - cos(x)^2 ) + isin(2x)

cos(2x) = 2cos(x)^2 - 1

cos(x)^2 = ( cos(2x) + 1 ) / 2

The exponential notation for all that is just: (e^ix)^2 = e^i2x, which is much more compact and easier to calculate.

Because of that, the overwhelming practical meaning of 'the square root of negative one' is 'I don't want to do trig'. The whole 'multiply by a specific frequency' part of the Fourier transform boils down to 'e^ist x H()', where 's' is a scaling constant that defines the frequency, 't' is time, and H() is the waveform to be processed. You get the average at a given value of s by integrating, for which I'll use the symbol 'S', so the Fourier transform is

F() = S e^ist x H()

for all values of s and t. F(x) shows the amplitude of any component of H() that exists at frequency e^ixt

The Hartley transform does the same math in terms of sines and cosines. The process and answers are basically the same, they just use different notation.

So with all that as background, an FFT starts by taking the average of all the samples in H(), then the average of all the samples 2 places apart (1,3,5,...) and (2,4,6,...), then the average of all samples 4 places apart, and so on. The number of sequences doubles every time, but the number of samples per sequence is cut in half. If you have N samples in H(), an FFT can recognize N/2 different frequencies (called 'bins'), and the frequency resolution of each bin is R/N where R is your sampling frequency.

So if you want 32 pixels worth of output, you'll need at least 64 samples per group H(), and if you sample at 44kHz, each bin will include frequencies within about 690Hz of each other (44k/64).

Find an FFT package and play with those values to get a feel for how they interact, and how much processing load they take. Once those parameters start to make sense, you'll be on a better foundation to build the system you want.