Can anyone pull my head out of the clouds

New to XMOS and XCore? Get started here.
User avatar
leon_heller
XCore Expert
Posts: 546
Joined: Thu Dec 10, 2009 10:41 pm
Location: St. Leonards-on-Sea, E. Sussex, UK.

Post by leon_heller »

Designing a new board specifically for this project might be a better solution.

Leon


User avatar
seulater
Member++
Posts: 28
Joined: Sat Jan 09, 2010 11:04 pm

Post by seulater »

oh i totally agree. To get all the sram pins on one core is ideal.

I just want to understand how to do it so if this situation does crop up i am not like a deer in the headlights. Plus it will better help me to understand the core.
User avatar
paul
XCore Addict
Posts: 169
Joined: Fri Jan 08, 2010 12:13 am

Post by paul »

Ok let me clear a few things up - this is a long post, but hold on in there and I hope it makes sense!!

1) The internal RAM issues - the XS1 architecture is novel, so I guess this can be confusing to people as they already have an impression of how multicore stuff currently works. In a 'normal' multicore architecture communication is done on a bus or through shared memory - this is not the case with XMOS' arch.

Each core is essentially a self contained processor, with memory and ports. It has links that allow it to connect to a network of devices that allow high speed communication between them. The XS1-G4 device has an integrated 4 cores with a switch in the centre that allow communication between the 4 cores. This means that all inter-core communication is done over this network and no other processor core has access to another core's memory. Threads within a core can share memory...

The main lesson here is that each core within a processor is individual - hence you can't put 200K of stuff into the device unless you divide it over the cores and get them to communicate it around.

2) Port Naming - The XS1 architecture has its ports arranged in banks that contain 1,4,8,16 bit ports (as seen in the port map diagrams such as https://www.xmos.com/published/xdk-portmap) and each core has a 32 bit port muxed over banks 1&2 of the ports.

Within the port maps the names are given alphabetical labels as providing a reference to each pin number is meaning less and less than useful for a 4/8/16/32 bit port - as these contain multiple pins and also a package reference isn't all that helpful as port resource numbers are replicated across each core for its own IO.

As you may have already worked out a pin label such as "P4A1" means 'bit [1] of a 4 pin port with label A" - referred collectively as 4A.

3) XN Files - These files are optional, and I agree that not having one in an initial tutorial could be confusing! The XN files purpose is to describe the board that you are working on and to allow you to give the pins meaningful names instead of the default ones (e.g. XS1_PORT_4A). The compiler automagically generates a platform.h file that contains this information for your project so be sure to include it!

The XN files also help describe the configuration of boards with multiple devices which have XMOS link connections (XMOS Link (aka XLink) = external communication protocol of a core). This allows you to program and configure these situations with ease.

The XN also describes the location of an SPI flash device for programming - an SPI device can be placed on any 1bit ports if a bootloader is used (see notes on OTP). By default an SPI flash device for automatic booting is on ports 1A, 1B, 1C and 1D.

4) OTP being only 8K - Yes the OTP is small, but then it is only really intended for small applications (boot loaders or simple applications) and device information (serial numbers, MAC addresses, encryption keys etc). If you were to want to boot from non-standard SPI pins (see above) then you can implement a bootloader that does this and have the device boot from SPI. This could be as flexible as you wanted as long as it fitted in the 8K - so you could boot from UART or I2C if you implemented it.

5) Connecting external RAM - as the XS1 architecture does not have a memory management unit you have to implement it (such as the SDRAM code example referenced earlier in this thread). This allows you to read and write data using the channel communication. This can be inter- or intra-core. I would advise that you connect the SRAM or SDRAM to a single core - this may well require a custom board. Spreading IO for a single external device would be feasible, but nastily complicated and I would avoid it at all costs.

6) Large amounts of static data - you mentioned things such as JPGs and fonts. If these are as big as you say they are then store them in the external flash. This is done in the XC-3 LED reference design (see https://www.xmos.com/xc3) that XMOS have for the LED tile gamma tables. This data can then be loaded from the flash into your external RAM either at boot or just when needed.

7) Release and debug mode for compiling - within the IDE you are given the option of 'Release' or 'Debug' modes. The main difference is that release uses -O3 optimisation settings and Debug using -O0 and instructs the compiler to insert debug information into the XE binary file using the -g flag.

You raise a concern about delays in this. If you implement the delays using naive loops then you would see timing issues between the two. However if you use XMOS' hardware timing then it this is completely unaffected by code optimisation (assuming you meet the code timing contraints between the timer set/fire).

8) Use of printf() in the XCore - if you use printf() within the XCore the output is sent to a console via JTAG (i.e. through debug channels). This will pause the processor, hence you need to think about where you put the printf() as it could affect any realtime activities.

For your method of debugging I would suggest you implement a serial or other interface you can print debug information to.

:D Well done!!! - you have reached the end. I hope I have covered most of the questions in this post. If you have anymore questions then feel free to list them and I will try and answer them in a helpful manner!

Thanks!
Paul

On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
User avatar
seulater
Member++
Posts: 28
Joined: Sat Jan 09, 2010 11:04 pm

Post by seulater »

Paul, I cannot thank you enough for taking your time to make that greatly helpful explanatory post!!!
The clouds in my head are starting to thin out.

I understand connecting the ram to a single core would be best. I wanted to use the kit to try it out, so now that i know more i see it will be more of a hassle to use the kit to try this out as i would have to use 2 cores to try it out on the kit. OK, that issue solved.

I have a few questions about some of your points.

3) XN Files, you cleared most of that up for me. I am now curious if i have an .xn file and i include the platform.h file will they buck heads if a pin is defined in there and also in the .xn file? Or is it just common sense to look into the platform.h file and make sure nothing defied in there will conflict with something i did in the .xn file.

6) Large amounts of static data, when i have the font table for the LCD those will be stored as a const, which the compiler will include in my code space. Which then i would need to know if i compile as release and the compiler downloads my code to the SPI flash part, what are my size restrictions. the XC-1A kit comes with a 4Mbits SPI FLASH memory. Does this mean that i can use all of that area for my code space ? What if one were to outgrow that space area, could one just simply add a 8Mbits SPI FLASH memory part in its place ?


7) Release and debug mode for compiling, I had a strange experience with this. I had my project set for release. I also chose the release folder for the download. when i compiled it ran on the board just fine, but when i removed power and re-connected it, it then ran the original demo and not my code. So where did i go wrong with that?

8) Use of printf() in the XCore - if you use printf() within the XCore the output is sent to a console via JTAG. Yes i saw that, what i was trying to do later was to change the .xn file to tell it the serial TX and RX pins are to be on another port pins. so it would not direct the printf to the USB.
when i started that i cot compile errors and even when i deleted the .xn file and had the platform.h included it still would not compile. Which is then about the time i packed it away for sale.


The core sharing thing is still not clear with me. I.E. if i want to share and int vs a sting is there a difference in what i do? Is there a limit to the data size i share ? is there a limit to how many things i am sharing core to core.


I just need to get over a couple hurdles to get me excited enough to try it out again, if you or anyone is willing to do so via an example.

There are only two things that i think would get me past my humps.

#1) serial comms set up on different pins on the core0.
#2) The core sharing thing is still not clear with me. I think an example to show me the following would unlock my mind cramp on this.

Lets say i have a 4x20 lcd connected to core1. Core1 duty is to initialize the lcd and then just wait for data to be ready. when data is ready, it will take it and put it on the lcd. This data will consist of, line#(1-4) char position(0-19) and lastly the actual sting of text.
There needs to be a shared flag in core1, to let any core that will be sending this data to core1 that the lcd is ready and not still working on writing chars to the lcd. (i know this does not need to be so, but it will help me understand more about the whole concept)

Now, Core0 is in charge of monitoring say 4 buttons. when button 1 is pressed i want it to let core1 know the sting i want to send. I.E.
core1(1,0,"Button 1 Pressed");
core1(2,0,"Button 2 Pressed");
core1(3,0,"Button 3 Pressed");
core1(4,0,"Button 4 Pressed");

I would like to start out with serial comms first.

I am using the XC-1A kit, I would like to start out with simple serial comms.

the skiz shows that RX is pin X0D23 and the TX pin is X0D24. This gets routed to the FTDI chip. for the USB. This project will be a release project, so i will connect the device, flash it then close the compiler and open my Com port monitor, reconnect the device and open that com port to see the serial data.

If you guys dont have time for this, i understand..


Though i do thank you all for your time thus far!

Jim
Heater
Respected Member
Posts: 296
Joined: Thu Dec 10, 2009 10:33 pm

Post by Heater »

Let me have a go at a few of these points.

3) The platform.h file is not written by you. It is generated by the build system from the content of the .xn file. In fact platform.h is a temporary file that is deleted when the build is complete. So there is no chance of any head bucking.

6) Large amounts of static data. We have to face the fact that a core cannot hold more than 64K of data plus code. So if you define a lot of static constant data, like fonts, in your code you are going to eat into that space rapidly. However one could have a lot more space in the SPI FLASH, or one might have an SD card attached. In that case there is the possibility to read such data from the FLASH or SD or whatever on the fly as the program requires it. Perhaps there is a FAT file system on the SD to make that easier.

7) I don't think there is much difference between release and debug builds. For debug no compiler optimizations are switched on. That's because heavy optimizations can remove code sequences and move code around in ways that no longer match up with the lines of code in the original source. This makes stepping through lines of code with a debugger somewhat confusing. Ypu might also have some defines that are different between debug and release builds. For example things that enable or disable logging output or your printfs.

Release or debug both download code into RAM and execute it. So a power cycle blows it away. If you want the code there to boot on power up you have to blow it into the SPI FLASH. Have a look at the build tools documents to see how to do that.

For the "core sharing thing" I can only suggest you look at the code I posted above. That in a very general way allows one core to access the memory of another core via a channel. OK it's very simple and deals with single bytes on each read write but I'm sure you could see ways to extend it for blocks of bytes, integers, strings or whatever other data structures you want.

As for your LCD example. If the core driving the LCD is busy waiting on the LCD then it just does not read data from its input channel. At that point any thread sending data for the LCD on that input channel will hang up because the LCD core is not emptying the channel.

If you want the user of the LCD to hang then you will have to implement some "busy" feedback through the channel to let it know the LCD input channel is full.

Someone else may have to chime in regarding the serial port issues. I have no hardware and so I'm loath to start advising on how to drive it.

Good luck.
User avatar
seulater
Member++
Posts: 28
Joined: Sat Jan 09, 2010 11:04 pm

Post by seulater »

Thanks heater for chiming in, all comments are welcome.
3) The platform.h file is not written by you. It is generated by the build system from the content of the .xn file. In fact platform.h is a temporary file that is deleted when the build is complete. So there is no chance of any head bucking.
i am still confused on this as the demo app for the board does not have an .xn file.
We have to face the fact that a core cannot hold more than 64K of data plus code.
Whoa! wait a second. Lets say i have a XS1-L1 single core device. I am reading what you said as if i have a static data table of say 20k, and the code size is 47k, its a no go ?
I was under the assumption (which may sound stupid) that my program goes into the SPI flash part and the core upon boot gets it from there as it needs. after typing that i think see how could it. it takes the program from flash, loads it into ram and runs the code from there, right ?

DAM! then i have to look at these examples more clearly as i dont know how in the heck these guys are pulling off some of this TFT LCD stuff. the ram buffer alone for the 480x272 lcd is going to be (480*272*3) = 391680 bytes, How in the world did they pull that off?????
Heater
Respected Member
Posts: 296
Joined: Thu Dec 10, 2009 10:33 pm

Post by Heater »

OK now I'm confused as well. Are you saying you have a demo app that you can compile and run that has a platform.h file but no .xn file. If so tell me which one so I can take a look maybe.

It's just a guess but it looks to me that if you have a .xn file it will get processed by the build system to create a platform.h which is then used when compiling the rest of the c and xc source. When the build is complete the platform.h is deleted. BUT if you have a platform.h and no .xn file then it just uses that as is.

Well a static data table is pretty much equivalent to code as far as the loader is concerned. It's a bunch of bytes that have to be loaded into RAM some where. If the total does not fit then it does not fit. The only way around it is not to define your large tables as static data in code but instead keep them as some kind of "file" in some other part of the FLASH. From there they can be read by the program as it needs.

I reckoned your LCD was needing about 40K of data to represent its image. Does it really need that in a frame buffer in core? Anyway a core will hold a 40K frame buffer and then you have 20K of code space for the code to drive it. Sounds doable to me.

Don't forget the was a time when full up computers for word processing and spread sheeting etc only had 64K of RAM.
User avatar
seulater
Member++
Posts: 28
Joined: Sat Jan 09, 2010 11:04 pm

Post by seulater »

OK now I'm confused as well. Are you saying you have a demo app that you can compile and run that has a platform.h file but no .xn file. If so tell me which one so I can take a look maybe.
yup. the demo for the kit, its here:
https://www.xmos.com/published/xc1afirmware

I reckoned your LCD was needing about 40K of data to represent its image. Does it really need that in a frame buffer in core? Anyway a core will hold a 40K frame buffer and then you have 20K of code space for the code to drive it. Sounds doable to me.
my lcd is 480x272, so i would need 480 * 272 * 3(R,G,B) for a total of 391680 bytes for the buffer.
Yet, in the book he has an example of drive this same size LCD. but he did not include the code part of it to read from the ram and actually put the color data there which was the most important part i was looking for. I must be not thinking this out as XMOS has a nice size color screen in their XS1-G kit.
User avatar
leon_heller
XCore Expert
Posts: 546
Joined: Thu Dec 10, 2009 10:41 pm
Location: St. Leonards-on-Sea, E. Sussex, UK.

Post by leon_heller »

Have a look at the XDK demos. You should be able to extract the code you need from one of them.

Leon
User avatar
seulater
Member++
Posts: 28
Joined: Sat Jan 09, 2010 11:04 pm

Post by seulater »

Thats what i am doing know.

What got me started on this venture was my excitement that i could drive an LCD panel without an LCD controller. It is fast enough to do that. Yet what i am hearing i dont see how its possible to do that. The way i am looking into this is i would need to create a ram buffer to hold my image for the screen. even if i were to use their size of 320x240 = 76800. that in it of itself is larger than will fit into the part.

so i decided to jsut read and read documents until im blue in the face.
i am reading this document portXS1.pdf in there it touches on my question.Here is the code they provided.

Code: Select all

out buffered port:4 HSYNC_port = XS1_PORT_4F;
out buffered port:1 DTMG_port = XS1_PORT_1B;
out port DCLK_port = XS1_PORT_1A;
out buffered port:32 RGB_port = XS1_PORT_32A;
clock clk = XS1_CLKBLK_1;
void lcd_init() {
unsigned rows, lines, x, time = 0;
set_clock_div(clk, 20);
configure_out_port_no_ready(HSYNC_port, clk, 0);
configure_out_port_no_ready(DTMG_port, clk, 0);
configure_out_port_no_ready(RGB_port, clk, 0);
configure_port_clock_output(DCLK_port, clk);
start_clock(clk);
x = nextRGBSample();
while(1) {
time += 500;
for(int lines = 0; lines < 320; lines++) {
time += 8; HSYNC_port @ time <: 1;
time += 31; DTMG_port @ time <: 1;
RGB_port @ time <: x;
x = nextRGBSample();
for(int rows = 1; rows < 240; rows++) {
RGB_port <: x;
x = nextRGBSample();
}
time += 240; DTMG_port @ time <: 0;
time += 13; HSYNC_port @ time <:0;
}
}
}
The part they are leaving out is the part i need to know. I want to know how are they getting the color data. like i mentioned before this 320x240 lcd is going to need 76800 for its buffer.
It looks to me in the code above, "x = nextRGBSample();" is the call that is used to get this data.
how can they get 76800 if they only have 64k to play with ?