Posting a reply in case anyone else has a similar issue: There are two issues with the code running on a Raspberry Pi. Both the 'image' function and the 'show' function are contributing to the slow refresh rate.
The image() function:
As the code mentions, the nested 'for' loops is very slow when counting through all the pixels. If the Raspberry Pi is struggling to generate frame data in ~6 seconds with a 1GHz CPU, I shudder to think how slow this would run on a small embeded CPU running Arduino or CircuitPython.
I wrote a modified image() function using numpy that cuts down the time significantly. I also combined this function with code to add the required header values to the array to be sent over SPI. This display requires headers before sending each line of data
Code: Select all
[setup byte], [Address 1], [line 1 data (400/8 bytes)]
[0x00] , [Address 2], [Line 2 data]
...
[0x00] , [Address 240], [Line 240 data], [0x00], [0x00]
I wanted to combine all the data to send over SPI into one big glob, and send it all at once. See my explanation of the issue with the show() function below for why. I tried this several ways.
- Compute the header values during the image() function
- Precompute the header values in python lists
- Precompute the header values in numpy arrays
I benchmarked these solutions using the timeit function. The results were:
- Original display.image(image) call: 6.1 sec
- Compute headers: .031sec
- Precompute list headers: .018sec
- Precompute numpy headers: .011sec
When I get a chance, I will make the code into a more usable format, but in case anyone is impatient and wants to shift through my messy code, here are the relevant bits.
Precomputing headers/tails. the *_pc variables are the python lists, the *_pc_np variables are the numpy arrays.
Code: Select all
import numpy as np
#precomput header and tail arrays
HeaderVals_pc = [[0, reverse_bit(1)]]
for i in range (2,DisplayHeight+1): #the last address should be 240
HeaderVals_pc.append([0, reverse_bit(i)])
TailVals_pc = np.array([0, 0])
HeaderVals_pc_np = np.asarray(HeaderVals_pc)
TailVals_pc_np = np.asarray(TailVals_pc)
Functions to generate the full image frame. These functions take the image from PIL and convert it into the proper format with the correct headers to be sent in a single SPI transaction. FormatFrame() computes the header values in the function. FormatFrame_pc() uses the precomputed python lists for the headers, FormatFrame_pc_np() uses the precomputed numpy arrays for the header and tail. The functions work on a global image object, and put the resulting frame into a global DisplayBuffer bytearray.
Code: Select all
def FormatFrame():
global image
global DisplayBuffer
#Import image and convert
ImageArrayLocal = np.packbits(np.asarray(image), axis=1)
#Generate the headers. This can be precomputed so that it does not need to be generated every time.
#Is this worth it? The header is 240x2=480 bytes long. Trade memory usage for speed.
HeaderVals = [[0, reverse_bit(1)]]
for i in range (2,DisplayHeight+1): #the last address should be 240
HeaderVals.append([0, reverse_bit(i)])
#We need to add two bytes to the very end of the frame. Their value does not matter.
TailVals = np.array([0, 0])
DisplayBuffer = np.append(np.hstack((HeaderVals, ImageArrayLocal)),TailVals).tolist()
def FormatFrame_pc():
global image
global DisplayBuffer
#Import image and convert
ImageArrayLocal = np.packbits(np.asarray(image), axis=1)
DisplayBuffer = np.append(np.hstack((HeaderVals_pc, ImageArrayLocal)),TailVals_pc).tolist()
def FormatFrame_pc_np():
global image
global DisplayBuffer
#Import image and convert
ImageArrayLocal = np.packbits(np.asarray(image), axis=1)
DisplayBuffer = np.append(np.hstack((HeaderVals_pc_np, ImageArrayLocal)),TailVals_pc_np).tolist()
I am not sure if this code can be added back into the adafruit_sharpmemorydisplay library. I will need to think if this is possible.
The problem with the show() function:
The show function is not nearly as slow as the image() function, but it was still ~1 sec to update the screen. Looking at the code, I did not see anything obviously amiss.
The relevant bit of code is:
Code: Select all
# toggle the VCOM bit
self._buf[0] = _SHARPMEM_BIT_WRITECMD
if self._vcom:
self._buf[0] |= _SHARPMEM_BIT_VCOM
self._vcom = not self._vcom
self._spi.write(self._buf)
slice_from = 0
line_len = self.width // 8
for line in range(self.height):
self._buf[0] = reverse_bit(line + 1)
self._spi.write(self._buf)
self._spi.write(memoryview(self.buffer[slice_from : slice_from + line_len]))
slice_from += line_len
self._buf[0] = 0
self._spi.write(self._buf)
self._spi.write(self._buf) # we send one last 0 byte
self._scs_pin.value = False
self._spi.unlock()
One thing to note here is that there a bunch of short SPI writes, instead of one big one. On a microcontroller, I expect this would be fine. However, on the Raspberry Pi, I suspect that the linux scheduler is limiting how fast we can send data to the SPI hardware. Taking a look at the SPI bus, we can see that there is a ~1ms gap between each SPI transaction.
(Labels 1, 2, and 3 correspond to the setup byte, 1st address byte, and the first line data)
This gap is roughly constant no matter how long the SPI transaction is. For the above code, there are ~240*3+2=722 SPI send calls. if each of those takes ~1ms to complete, that will be ~.75 sec, which lines up with what I was seeing. To fix this, I combined all the frame data to send into a single bytearray. I then sent all this at once in a single transaction. The code to combine the data and genreate the bytearray is above, the send code is pretty simple. I modified the send function as below, using globals so that I don't have to mess with classes.
Code: Select all
def SendFrame():
global spi
global scs
global vcom
BaudRate = 2000000
# CS pin is inverted so we have to do this all by hand
while not spi.try_lock():
pass
spi.configure(baudrate=BaudRate)
scs.value = True
# toggle the VCOM bit
CommandByte = _SHARPMEM_BIT_WRITECMD
if vcom:
CommandByte |= _SHARPMEM_BIT_VCOM
vcom = not vcom
DisplayBuffer[0] = CommandByte
spi.write(DisplayBuffer)
scs.value = False
spi.unlock()
I did not benchmark this code, but the display update appears to happen instantly, which is good enough for what I need.