adafruit industries

ilikecake · Post by **ilikecake** » Sat Dec 04, 2021 3:20 pm

Hi All,

I am using a 2.7" Sharp Memory display (https://www.adafruit.com/product/4694) with a Raspberry Pi Zero. I have connected the display to the Pi based on the tutorial here (https://learn.adafruit.com/adafruit-sha ... hon-wiring), and the basic demos work using the PIL library to generate an image and then write that image to the screen. However, screen updates are very slow. The two lines of code that seem to cause the most delay are

Code: Select all

display.image(image)    #Takes ~7 sec to run
display.show()    #Takes ~1 sec to run

This leads to two questions:

1: The display.image(image) code appears to be converting the pixels from the PIL image format into a format that can be sent to the device. This function contains this code

Code: Select all

        # Iterate through the pixels
        for x in range(width):  # yes this double loop is slow,
            for y in range(height):  #  but these displays are small!
                if img.mode == "RGB":
                    self.pixel(x, y, pixels[(x, y)])
                elif pixels[(x, y)]:
                    self.pixel(x, y, 1)  # only write if pixel is true

As the comment states, this is probably the reason this code takes so long. I found a similar issue here (https://github.com/adafruit/Adafruit_Ci ... ay/pull/47) that was helped by using numpy to do this conversion. Would a similar process be doable here? Is there some better way to use this device that has already been implemented? I am willing to try to figure out how to do this with numpy, but I don't want to waste time if there is a better way to do this.

2: For the display.show() line, this is not nearly as slow as the conversion, but it still takes ~1 second to complete. This function is writing the actual pixel data to the device, and as this function executes, you can see the screen update in real time. There must be a way to do this faster, as one of the examples shows a video running on this device at a reasonable framerate. Is there anything I can do to speed up this code?

ilikecake · Post by **ilikecake** » Tue Dec 07, 2021 10:46 pm

Posting a reply in case anyone else has a similar issue: There are two issues with the code running on a Raspberry Pi. Both the 'image' function and the 'show' function are contributing to the slow refresh rate.

The image() function:

As the code mentions, the nested 'for' loops is very slow when counting through all the pixels. If the Raspberry Pi is struggling to generate frame data in ~6 seconds with a 1GHz CPU, I shudder to think how slow this would run on a small embeded CPU running Arduino or CircuitPython.

I wrote a modified image() function using numpy that cuts down the time significantly. I also combined this function with code to add the required header values to the array to be sent over SPI. This display requires headers before sending each line of data

Code: Select all

[setup byte], [Address 1], [line 1 data (400/8 bytes)]
[0x00]       , [Address 2], [Line 2 data]
...
[0x00]       , [Address 240], [Line 240 data], [0x00], [0x00]

I wanted to combine all the data to send over SPI into one big glob, and send it all at once. See my explanation of the issue with the show() function below for why. I tried this several ways.

Compute the header values during the image() function
Precompute the header values in python lists
Precompute the header values in numpy arrays

I benchmarked these solutions using the timeit function. The results were:

Original display.image(image) call: 6.1 sec
Compute headers: .031sec
Precompute list headers: .018sec
Precompute numpy headers: .011sec

When I get a chance, I will make the code into a more usable format, but in case anyone is impatient and wants to shift through my messy code, here are the relevant bits.

Precomputing headers/tails. the *_pc variables are the python lists, the *_pc_np variables are the numpy arrays.

Code: Select all

import numpy as np

#precomput header and tail arrays
HeaderVals_pc = [[0, reverse_bit(1)]]
for i in range (2,DisplayHeight+1):   #the last address should be 240
    HeaderVals_pc.append([0, reverse_bit(i)])

TailVals_pc = np.array([0, 0])

HeaderVals_pc_np = np.asarray(HeaderVals_pc)
TailVals_pc_np = np.asarray(TailVals_pc)

Functions to generate the full image frame. These functions take the image from PIL and convert it into the proper format with the correct headers to be sent in a single SPI transaction. FormatFrame() computes the header values in the function. FormatFrame_pc() uses the precomputed python lists for the headers, FormatFrame_pc_np() uses the precomputed numpy arrays for the header and tail. The functions work on a global image object, and put the resulting frame into a global DisplayBuffer bytearray.

Code: Select all

def FormatFrame():
    global image
    global DisplayBuffer
    
    #Import image and convert
    ImageArrayLocal = np.packbits(np.asarray(image), axis=1)

    #Generate the headers. This can be precomputed so that it does not need to be generated every time. 
    #Is this worth it? The header is 240x2=480 bytes long. Trade memory usage for speed.
    HeaderVals = [[0, reverse_bit(1)]]
    for i in range (2,DisplayHeight+1):   #the last address should be 240
        HeaderVals.append([0, reverse_bit(i)])
    
    #We need to add two bytes to the very end of the frame. Their value does not matter.
    TailVals = np.array([0, 0])
    
    DisplayBuffer = np.append(np.hstack((HeaderVals, ImageArrayLocal)),TailVals).tolist()

def FormatFrame_pc():
    global image
    global DisplayBuffer
    
    #Import image and convert
    ImageArrayLocal = np.packbits(np.asarray(image), axis=1)
    
    DisplayBuffer = np.append(np.hstack((HeaderVals_pc, ImageArrayLocal)),TailVals_pc).tolist()
    
def FormatFrame_pc_np():
    global image
    global DisplayBuffer
    
    #Import image and convert
    ImageArrayLocal = np.packbits(np.asarray(image), axis=1)
    
    DisplayBuffer = np.append(np.hstack((HeaderVals_pc_np, ImageArrayLocal)),TailVals_pc_np).tolist()

I am not sure if this code can be added back into the adafruit_sharpmemorydisplay library. I will need to think if this is possible.

The problem with the show() function:

The show function is not nearly as slow as the image() function, but it was still ~1 sec to update the screen. Looking at the code, I did not see anything obviously amiss.

The relevant bit of code is:

Code: Select all

# toggle the VCOM bit
        self._buf[0] = _SHARPMEM_BIT_WRITECMD
        if self._vcom:
            self._buf[0] |= _SHARPMEM_BIT_VCOM
        self._vcom = not self._vcom
        self._spi.write(self._buf)

        slice_from = 0
        line_len = self.width // 8
        for line in range(self.height):
            self._buf[0] = reverse_bit(line + 1)
            self._spi.write(self._buf)
            self._spi.write(memoryview(self.buffer[slice_from : slice_from + line_len]))
            slice_from += line_len
            self._buf[0] = 0
            self._spi.write(self._buf)
        self._spi.write(self._buf)  # we send one last 0 byte
        self._scs_pin.value = False
        self._spi.unlock()

One thing to note here is that there a bunch of short SPI writes, instead of one big one. On a microcontroller, I expect this would be fine. However, on the Raspberry Pi, I suspect that the linux scheduler is limiting how fast we can send data to the SPI hardware. Taking a look at the SPI bus, we can see that there is a ~1ms gap between each SPI transaction.

(Labels 1, 2, and 3 correspond to the setup byte, 1st address byte, and the first line data)

This gap is roughly constant no matter how long the SPI transaction is. For the above code, there are ~240*3+2=722 SPI send calls. if each of those takes ~1ms to complete, that will be ~.75 sec, which lines up with what I was seeing. To fix this, I combined all the frame data to send into a single bytearray. I then sent all this at once in a single transaction. The code to combine the data and genreate the bytearray is above, the send code is pretty simple. I modified the send function as below, using globals so that I don't have to mess with classes.

Code: Select all

def SendFrame():
    global spi
    global scs
    global vcom

    BaudRate = 2000000

    # CS pin is inverted so we have to do this all by hand
    while not spi.try_lock():
        pass
    spi.configure(baudrate=BaudRate)
    scs.value = True

    # toggle the VCOM bit
    CommandByte = _SHARPMEM_BIT_WRITECMD
    if vcom:
        CommandByte |= _SHARPMEM_BIT_VCOM
    vcom = not vcom
    DisplayBuffer[0] = CommandByte
    spi.write(DisplayBuffer)

    scs.value = False
    spi.unlock()

I did not benchmark this code, but the display update appears to happen instantly, which is good enough for what I need.

mikeysklar · Post by **mikeysklar** » Thu Dec 09, 2021 5:08 pm

Wow, nice work. Really cool how much performance you were able to get out of switching from nested for()'s to numpy array manipulation. Ditto for one big bytearray to reduce the SPI overhead. Would you like to submit any of this to the github repo?

https://github.com/adafruit/Adafruit_Ci ... oryDisplay

ilikecake · Post by **ilikecake** » Sat Jan 01, 2022 2:14 pm

Sorry, I have not had as much time to work on this because of the holidays. I am trying to package this code in a more user friendly way. If possible I would like to submit it to the repo, but I am a total noob when it comes to github collaboration, so it might take me a while to figure that out.

Additionally, some of the changes that I made will have implications for compatibility with the base libraries and other circuit python devices. I need to go back and review how the base libraries worked, but this may require a change to the frambuffer format or use of additional memory. What is the correct venue for asking what is an allowable change, and what would be too different from the base libraries? I don't mind submitting this code back to the repo, but I don't want to go down a dead-end path if my changes would be too drastic to incorporate.

mikeysklar · Post by **mikeysklar** » Tue Jan 11, 2022 7:33 pm

The github repo has an issues area where you can point out individual performance issues one thread a time and the solution you have used.

https://github.com/adafruit/Adafruit_Ci ... lay/issues

Maybe just start with the main two performance boosters as separate issue threads:

numpy versus nested for loops and larger bytearray to reduce SPI calls.

Just sharing you code will be helpful and will likely get feedback or integration from someone else who manages the repo.

adafruit industries

Sharp Memory Display slow update on Raspberry Pi

Sharp Memory Display slow update on Raspberry Pi

Re: Sharp Memory Display slow update on Raspberry Pi

Re: Sharp Memory Display slow update on Raspberry Pi

Re: Sharp Memory Display slow update on Raspberry Pi

Re: Sharp Memory Display slow update on Raspberry Pi