Hi speed port output.

Technical discussions around xCORE processors (e.g. xcore-200 & xcore.ai).
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Hi speed port output.

Post by lilltroll »

Little question about the port-logic.

As an example, lets suppose that we should do a SVGA output @ 60Hz frame rate (LCD likes 60 Hz).

SVGA Signal 800 x 600 @ 60 Hz timing

General timing
Screen refresh rate 60 Hz
Vertical refresh 37.878787878788 kHz
Pixel freq. 40.0 MHz
Horizontal timing (line)
Polarity of horizontal sync pulse is positive.

Scanline part | Pixels | Time [µs]
Visible area 800 20
Front porch 40 1
Sync pulse 128 3.2
Back porch 88 2.2
Whole line 1056 26.4
Vertical timing (frame)
Polarity of vertical sync pulse is positive.

Frame part | Lines | Time [ms]
Visible area 600 15.84
Front porch 1 0.0264
Sync pulse 4 0.1056
Back porch 23 0.6072
Whole frame 628 16.5792

So in this example I would like to output a new port value every 25ns during the line, and this whithout the use of extra clocks.
40 Mhz is not one of the available multiples of the RefClock.

Is the only way to change the PLL setting in the XCORE so it runs at 80 MHz?

I guess I have 2 alt:
I) Use buffered ports with 8 or 16 bit, so I have time for the for loop.

Code: Select all

unsigned LineBuffer["2 or 4 or 6 or 8"][800]

#pragma unsafe arrays
for (int x=0 ;x<800 ; x++){
 #pragma loop unroll
  for (int k=0 ;k<4 ; k++){
   p<: LineBuffer[yPage][x];
   x++;
 }
}
(The k loop is to save most of the instruction that would check if x<800)

II) Use several 32bit buffered 1-bit ports and use asm("setpt .... ") to sync all the timing of all ports.
That would mean that I have 800 ns between all the needed updates.

Anyone with some tips for me !?

(Made a http://en.wikipedia.org/wiki/Mesh_analysis to study the error in a non-ideal R-2R ladder in MATLAB yesterday)
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

You didn't say what frequency your reference clock and core clock are running
at; I'll assume you have refclk at 100MHz and core clock 400MHz.

The only clock divisors you can use are 1 and 2n, so you cannot make 40MHz.
You'll have to use synchronised port mode, which means you'll have to output
a pixel every 10 cpu cycles. This requires tight control over all threads running
at that core: if one of them sleeps, it's game over. Other than that, you simply
keep 5 threads constantly running, and one of them outputs a pixel every 2 cpu
cycles (the others can do other stuff, compute the next line buffers perhaps).

So you have only two insns for each pixel, and one of them has to be the OUT
insn; the other insn then will have to get the data. You have no room to increment
a loop counter or anything, you'll need to unroll the whole loop (for the visible
part of the line, and where you output the synch pulse; you can do bookkeeping
during the "dead time").

Now, you cannot use the normal load insns (LDW or LDWI), you don't have enough
registers to provide enough base registers for them, and the immediate field in LDWI
is only 0..11. So you'll have to use either CP, DP or SP as the base reg. That's okay,
you're going to write this in asm anyway.

The load insns with those base regs only come in 32-bit forms, so if you want to do
16-bit or 8-bit output, you either waste memory (always an option if you have enough!),
or you can do e.g. (for 16-bit, port p, this is pixel pair n):

Code: Select all

ldw r0,cp[n] ; out p,r0 ; shr r0,r0,8 ; out p,r0
or optimised a bit:

Code: Select all

ldw r0,cp[n] ; outshr p,r0 ; some_other_insn ; out p,r0
which means you won't have to use CP/DP/SP even:

Code: Select all

ldw r0,rb[0] ; outshr p,r0 ; add rb,rb,4 ; out p,r0
For 8-bit, you get two spare insns. Maybe you can do something useful with those, dunno.

There are never two memory insns in sequence so the insn buffer always stays filled.

Did I mention I haven't tested this at all? :-)

It might be easier to use an 80MHz refclk.

Good luck and have fun,


Segher
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

Oh, I just thought of something that makes this whole exercise more feasible: if you
put all five threads in fast mode, they will always be scheduled, even when waiting
for something. But I haven't tested that either :-)
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

Thank's for many idea´s, but the thing with this play is to go very HW close - so keep em coming.

I wrote a small prog. for the XK-1 and tested it in the simulator.

To get stated I took the timing for 72 frames / s so the pixelclock gets 50.0 MHz

Using a 8 bit wide output, I do need any optimization with unroll loops or other things, since I only need to do the port write every 8:th instruction. So this is a clean XC-way with a double linebuffer.
Using a 16-bit wide port, that's worse!

Code: Select all

#include <xs1.h>
#include <platform.h>

#define SIZE 200

out port Hsync= XS1_PORT_1G;
out port Vsync= XS1_PORT_1H;
out buffered port:32 p_rgb=XS1_PORT_8A;
clock clk=XS1_CLKBLK_1;


unsigned color[2][SIZE];

#define HFrontPorch 56
#define HSyncPulse 120
#define HBackPorch 64
#define Htot 	1040

const unsigned VFrontPorch=37*Htot;
const unsigned VSyncPulse=6*Htot;
const unsigned VBackPorch=23*Htot;

int main(){
	timer t;
	int time;
	unsigned page=0;
	for(int j=0; j<2 ;j++)
	 for(int i=0; i<SIZE ;i++)
		color[j][i]=0xFF00FF00;

	set_clock_div(clk,1);
	set_port_clock(Vsync, clk);
	set_port_clock(Hsync, clk);
	set_port_clock(p_rgb, clk);
	start_clock(clk);

	t:>time;
	page!=page;
	t when timerafter(time+VFrontPorch):>time;
	Vsync<:0;
	t when timerafter(time+VSyncPulse):>time;
	Vsync<:1;
	t when timerafter(time+VBackPorch):>time;

for(int y=0;y<6;y++){
	t:>time;
	page!=page;
	t when timerafter(time+HFrontPorch):>time;
	Hsync<:0;
	t when timerafter(time+HSyncPulse):>time;
	Hsync<:1;
	t when timerafter(time+HBackPorch):>time;
	for(int x=0;x<SIZE;x++)
		p_rgb<:color[page][x];
	}
}

I only do 6 rows here, since I do not want to wait for the simulator to run all day.
I'm not sure if I understood the use of Front/Back Porch yet. Something needed in the 1960's ??
Does a LCD monitor assume that it's there exactly as in the standard to be able to predict the correct sampling speed/phase; meaning that I have to adjust all porch times for the time taken by the instructions to start going thereafter?

(The thing is that no clock is sent to the monitor, and a LCD screen has to sample the value from an "old" analogue signal, thus the LCD has to calculate the correct sampling-freq, and the correct phase of everything, otherwise the pixel-line will be distorted in some way.
The DVI uses a special 8to10 bit conversion of the digital colour data, and I guess that need a 6-8$ ASIC to compute at the actual speed.

:oops: it's colour not color in eng.
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
Berni
Respected Member
Posts: 363
Joined: Thu Dec 10, 2009 10:17 pm

Post by Berni »

Yeah these porches come from analog as CRTs need time to move the beam to the other side of the screen to start a new line or frame and those didn't need a clock ether as all it did was PLL the sawtooth sweeping voltage to the sync signals.

Its actually surprising that the cheep LCDs actually have VGA that is harder to do in digital with all these analog era stuff glued to it.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

Berni wrote:Yeah these porches come from analog as CRTs need time to move the beam to the other side of the screen to start a new line or frame and those didn't need a clock ether as all it did was PLL the sawtooth sweeping voltage to the sync signals.

Its actually surprising that the cheep LCDs actually have VGA that is harder to do in digital with all these analog era stuff glued to it.
The LCD manufactures hope that they can start to sell LCD's without VGA in 2015 (all over the product line), but it will probably be alive to at least 2020 before everyone has quit using it)

But what about the timing, I'm building a little adapter card to my XC-1A (Which is easy since everything is spaced at 100 mil now) and have cut an old VGA cable.
Do I have to get the timing correct down to 10 ns of the sync and Porch, or does the LCD has a magical brain that just adapts correctly. (The AUTO button does not always work to 100% !)

I'm soooo new to graphics, but I have been able to do cool things with the XDK based on a mix with a quad-line buffer @ 262000 colours (3.8 kB) + 2 bit XOR framebuffer (19.2 kB), thus everything fits in the memory of one core + a large program.
Probably not the most confused programmer anymore on the XCORE forum.
User avatar
Berni
Respected Member
Posts: 363
Joined: Thu Dec 10, 2009 10:17 pm

Post by Berni »

Well i guess the only way to find out is to test it, but i guess the size of the porch can be compensated over quite a range with the settings. If people manged to generate good VGA timings with PICs and AVRs surely it has to work no problem on a xcore.

The biggest problem perhaps is that there is not enough memory for a frame buffer so everything has to generate on the fly. That means a bit more complicated program as it has to spit out the data line by line and it cant miss its deadline not even once or the whole image goes kaput.

You can cheat the problem by using a external LCD controller chip and set up the RGB bus timings to match VGA. Those provide a frame buffer and often some graphics features (usually useless unless its a fast JPEG decode or something similar.).
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

I run the G4 at 320 MHz, instead of the 400 Hz.

Without any timing compensation my DSO says that my Vertical refresh is 37.84 kHz (Should be 37.878787878788 kHz)
and
frame refresh is 60.26 Hz (Should be 60 Hz).

My LCD tries to sync all the time but doesn't lock.

My 3 Display Data Channel pins is still free-floating.
I'm testing with a LG FLATRON L2010T with max 1600x1200 (Should not totally alias at 800x600)

Enough in the lab for today. Have to read a little on DDC. I was hoping for the monitor to lock without DDC data.

Should the sync pulses be Hi or Low e.g. should state be 0 or 1 ??? I hope the TTL is 3.3V comp, but it's year 2010. For the moment I'm feeding the 5V pin with 3.3V - that is maybe stupid.
The (unbuffered) analog RGB signal from the R-2R ladder look like sh*t, but at least it's between 0-0.7 V PtP, but that's the next headache, after I get a sync.

Code: Select all

#include <xs1.h>
#include <platform.h>

#define SIZE 200

on stdcore[0]:out port Hsync = XS1_PORT_1N;
on stdcore[0]:out port Vsync = XS1_PORT_1M;
on stdcore[0]:out buffered port:32 p_rgb = XS1_PORT_8C;
on stdcore[0]:clock clk = XS1_CLKBLK_1;

unsigned color[SIZE];

#define HFrontPorch (2*40)
#define HSyncPulse (2*128)
#define HBackPorch (2*88)
#define Htot 	(2*1056)

const unsigned VFrontPorch = 1 * Htot;
const unsigned VSyncPulse = 4 * Htot;
const unsigned VBackPorch = 23 * Htot;
const unsigned state = 0;

int main() {

	par {
		on stdcore[0]: {
			timer t;
			int time;
			unsigned page = 0;
			for (int j = 0; j < 1; j++)
				for (int i = 0; i < SIZE; i++)
					color[i] = 0xFFFF0000;

			set_clock_div(clk,1);
			set_port_clock(Vsync, clk);
			set_port_clock(Hsync, clk);
			set_port_clock(p_rgb, clk);
			start_clock(clk);

			while (1) {
				t:>time;
				page!=page;
				time+=VFrontPorch;
				t when timerafter(time):>time;
				Vsync<:state;
				time+=VSyncPulse;
				t when timerafter(time):>time;
				Vsync<:!state;
				time+=VBackPorch;
				t when timerafter(time):>time;
				for(int y=0;y<600;y++) {
					t:>time;
					page!=page;
					time+=HFrontPorch;
					t when timerafter(time):>time;
					Hsync<:state;
					time+=HSyncPulse;
					t when timerafter(time):>time;
					Hsync<:!state;
					time+=HBackPorch;
					t when timerafter(time):>time;
#pragma unsafe arrays
					for(int x=0;x<SIZE;x++)
					p_rgb<:0xFF00FF00;
				}
			}
		}
	}
}
Probably not the most confused programmer anymore on the XCORE forum.
ale500
Respected Member
Posts: 259
Joined: Thu Sep 16, 2010 9:15 am

Post by ale500 »

Both sync pulses should be positive. Be sure of having some pixels on (test pattern) or the monitor may not sync.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

ale500 wrote:Both sync pulses should be positive. Be sure of having some pixels on (test pattern) or the monitor may not sync.

Code: Select all

p_rgb<:0xFF00FF00;
( 8-bit parallel port) It should show a chess board. (Double size , eg. 0xFFFF0000 didn't work either)
Probably not the most confused programmer anymore on the XCORE forum.