When you take the spectrogram() of a pure tone, you would see a sharp peak in the spectrogram, with the high number corresponding to the frequency, and low values everywhere else. However, I think the problem is that mu isn't showing you the data in a convenient way to interpret what is going on. What you need to see is all 256 numbers returned by spectrogram() in a single plot. What mu is showing you is just the first 3 numbers returned by spectrogram() each time. Those numbers on their own don't mean a whole lot.
This is what a spectrogram with a pure tone will look like when correctly visualized. You'll notice that at the far left there's a second peak which is much higher, so we'll want to ignore that.
To make this plot, I printed the first 128 the values using
- Code: Select all | TOGGLE FULL SIZE
print(list(spectrum[:128]))
and put them into a graphing program on my computer. Because of reasons that are obvious only to mathematicians, the first half of the values from spectrogram() go in ascending frequency order and the second half go in descending frequency order. We'll just ignore the second half, which is why we say take the slice [:128], meaning the values at index 0, 1, 2, ..., 127.
So now how do we find that in CircuitPython? ulab has a routine called numerical.argmax that will return the index of the biggest item. So my strategy is in several steps:
- zero out the items 0 and 1 in the list, because we don't want them
- use argmax to find the index of the biggest item
- Use a little math to find out the frequency. I did not verify this, but I believe that to turn the index into the frequency, you multiply by the sample frequency and divide by the number of samples sent to spectrum(). However, this may not be right
- Code: Select all | TOGGLE FULL SIZE
# Main loop
while True:
mic.record(samples_bit, len(samples_bit))
samples = np.array(samples_bit[3:])
spectrum = extras.spectrogram(samples)
spectrum = spectrum[:128]
spectrum[0] = 0
spectrum[1] = 0
peak_idx = numerical.argmax(spectrum)
peak_freq = peak_idx * 16000 / 256
print((peak_idx, peak_freq, spectrum[peak_idx]))
time.sleep(0.1)
Because peak_idx is always a whole number, the frequency is always a multiple of (16000/256) = 62.5Hz. To a point, you could get a better estimate by increasing the number of samples you record, up to a point. There may be other techniques for refining the number, but I don't know them offhand.
When I ran this code on my device (a CLUE with built-in mic) and whistled a clear note, I got a consistent peak_freq for each note of an ascending scale.
- Code: Select all | TOGGLE FULL SIZE
(19, 1187.5, 294417.0)
(19, 1187.5, 245807.0)
(21, 1312.5, 174337.0)
(21, 1312.5, 273723.0)
(23, 1437.5, 51420.2)
I hope that this information helps get you going with your project. I'll keep an eye on this thread for responses.