Efficient YUV422 to RGB565 on L16

Technical questions regarding the XTC tools and programming with XMOS.
Vanillacoke
New User
Posts: 3
Joined: Fri Dec 06, 2013 3:38 pm

Efficient YUV422 to RGB565 on L16

Post by Vanillacoke »

Hello

I need to convert YUV422 image-data coming from an 640x480 image sensor, which I have already connected and implemented.

My first try consists of 3 Threads using shared-memory as framebuffer.
One Thread inputs the image-data according to V/H-SYNC Signal.
The other two Threads are Converter-Threads.
One Converts the odd-rows of a frame (Row 1, 3, 5...)
The other one the even-rows of a frame (Row 0, 2, 4...)

The fourth Thread of the "camera-core" will be used to transmit the RGB-data.

The goal is a frame-rate of 10FPS but currently I am not sure if it is possible on a xmos-cpu.

640x480x10FPS results in 3072000 pixel which have to be converted each second.
Additionally each thread have to keep track of the buffer-status too.
Each pixel have to be converted within ~80 Ticks @ 125Mhz/Thread using two Threads.

My current implementation is c-based using pointer and common converter-makros:

Code: Select all

#define CLIP(X) ( (X) > 255 ? 255 : (X) < 0 ? 0 : X)

// YUV -> RGB
#define C(Y) ( (Y) - 16  )
#define D(U) ( (U) - 128 )
#define E(V) ( (V) - 128 )

#define YUV2R(Y, U, V) CLIP(( 298 * C(Y)              + 409 * E(V) + 128) >> 8)
#define YUV2G(Y, U, V) CLIP(( 298 * C(Y) - 100 * D(U) - 208 * E(V) + 128) >> 8)
#define YUV2B(Y, U, V) CLIP(( 298 * C(Y) + 516 * D(U)              + 128) >> 8)
The conversion works fine but much to slow.

Does anyone have an idea of how to accomplish such requirements on a SliceKit(L16)?
Or is it impossible?

One idea is an assembly implementation but I am not sure if I could beat the -O3 optimization.

greetings Vanilla


mmar
Experienced Member
Posts: 123
Joined: Fri Jul 05, 2013 5:55 pm

Post by mmar »

As you know YUV422 is 16bit per pixel, then if use external sdram ,you can create conversion table and convert without slow mathematics. If you have more ram you can create full 24bit yuv to 565 16bit array or if less you can try create 256 arrays for all part of formulas. Btw. one array for 298*(Y-16)...for all Y.
User avatar
Lele
Active Member
Posts: 52
Joined: Mon Oct 31, 2011 4:08 pm

Post by Lele »

Don't know if -O3 optimization is smart enough to recognize that you are evaluating 298 * C(Y) three times and D(U), E(V) twice. May be using a variable for those partial result saves some clock...