Generating Sine Waves

Technical questions regarding the XTC tools and programming with XMOS.
User avatar
rp181
Respected Member
Posts: 395
Joined: Tue May 18, 2010 12:25 am

Generating Sine Waves

Post by rp181 »

I'm trying to generate 8 sine waves at a frequency of 20 kHz, with at least 256 points per period. However, I'm having trouble getting it to run fast enough to have 256 points, and I can only get 100 points in. Is there any way I can make this run faster? The 8-channel DAC is on an 8 bit port and a 4 bit port (it's a 12 bit DAC).

Code: Select all

	int i;
	char channel;

	dac.LDAC <: 0;
	dac.CS <: 1;
	dac.WR <: 0;

	while (1) {
		for (i = 0; i < SINE_TABLE_LENGTH; i += INCREMENT) { //INCREMENT = 155, length = 16383
			#pragma loop unroll
			for (channel = 8; channel < 15; channel++) {
				dac.chanSelect <: (channel);
				dac.data8 <: sin8[i];
				dac.data4 <: sin4[i];

				dac.CS <: 0;
				dac.CS <: 1;
			}
		}
	}
The array sin[] is a table of sine values from 0 to 2*PI, scaled to 0-4095, of the type unsigned char.
I tried generating 2 different tables of unsigned char to pre-compute the bit mask/bit shift, but for some reason, the DAC would just stay at a single value.

Any help would be appreciated!

EDIT: Realized I don't need the clock stuff since i'm just using it at 100 MHz anyway.

EDIT: Updated code, still only getting 105 points per period.


User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

rp181 wrote:I'm trying to generate 8 sine waves
Your code does 7 actually.
at a frequency of 20 kHz, with at least 256 points per period. However, I'm having trouble getting it to run fast enough to have 256 points, and I can only get 100 points in. Is there any way I can make this run faster? The 8-channel DAC is on an 8 bit port and a 4 bit port (it's a 12 bit DAC).
In the inner loop, you do 5 outputs per sample per channel. That
means that per second you want to do 20000*256*8*5 outputs;
that is 204.8M outputs per second. That won't fly; you cannot
do more than about 100M per second on a single thread (fewer
if you have more threads).

I see three easy ways to make it go faster:
1) Use buffered transfers. Your widest port is 8b, so you can do
four outputs at once. You should make the ports clocked, but you
should anyway, to make the output go smoothly;
2) Don't assert the CS signal manually, make one of the data ports
a master strobed port;
3) Use more than one thread to output your data, e.g. handle four
channels each on two threads.
User avatar
rp181
Respected Member
Posts: 395
Joined: Tue May 18, 2010 12:25 am

Post by rp181 »

Thanks for the reply! I actually didn't notice I was only doing 7, thanks for catching that!

1) I briefly tried that, but failed in getting it to work, and wasn't sure if it would help so I gave up on that. I shall try it again.

2) Will do!

3) I can't split up the 8 channels because all 8 channels are on 1 DAC chip, so only 1 thread has access to the ports. Or am I misunderstanding what you are saying?

At this point i'm just making it run as fast as possible, as I don't think I actually need as many as 256 points, as the waves will be driving inductors, which should provide some smoothing to the signal.
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

3) No, brain fart here :-)

You really should try to do 1); you really want to put the ports
on their own clock anyway, to get proper regular timing on it.
If you do that first, making buffered transfers work won't be so
hard. 1) gives you the best speedup, close to 4x!

Another thing... You are outputting the same data to all eight
DAC channels; if you really want that, you don't need to write
to the data ports for all eight, instead just once. But that
probably is just because you simplified the program a bit too
much? :-)
User avatar
Folknology
XCore Legend
Posts: 1274
Joined: Thu Dec 10, 2009 10:20 pm

Post by Folknology »

If you can arrange the ports as a 16 bit port it might be cleaner and with buffering and strobing arrangement as segher suggests you might just get there (32 into 16 doubles the rate). However I am just trying to understand what you need to do this for in the first place. If you need 8 sine waves, why not just create a single one and buffer/mux it however many times required. I assume therefore that you might wish to actually change the phase of each waveform in some way using table index offsetting or some such. If that is the case you won't be outputting the same data so won't easily be able to short cut those loops and will need to stick to your current code structure.

Either way I'm interested in what you actually trying to do, any chance of enlightenment?

regards
Al
User avatar
rp181
Respected Member
Posts: 395
Joined: Tue May 18, 2010 12:25 am

Post by rp181 »

Unfortunatley, IO is very tight, and this is the only way I could get it all to fit. And yes, eventually they will each have different frequencies (phase does not really matter), as defined by either different increments or different sine tables (I will see which is faster). The frequencies are going to be controlled by a PC frontend.

EDIT: I am not too familiar with port buffering. How would it work since some are 4 bit, and some are 8 bit? Would the values on the 4 bit port be bitshifted 4 or 8? In other words, if a 16 bit long value was outputted, would it write 4 times, or 2 times (port width bits at a time or 8 bits at a time)?
User avatar
Folknology
XCore Legend
Posts: 1274
Joined: Thu Dec 10, 2009 10:20 pm

Post by Folknology »

Right the different frequencies rather than phase will likely complicate things, but for the simple case (same frequency) you would want to attach the same clock to both the 8bit and 4bit ports to enable them to be synchronised on output along with a strobe (cs) and then use buffered serialisation 16 bit shifting in nibbles for the 4 bit port and 32 bit shifting in bytes for the 8 bit port something like this (not tested just conjured) :

Code: Select all


out buffered port:32 data8 = XS1_PORT_8A;
out buffered port:16 data4 = XS1_PORT_4D;
out port cs = XS1_PORT_1A;
clock clk  = XS1_CLKBLK_1;

...

configure_clock_src(clk, exClk); // to use external clk, or: configure_clock_rate(clk, FREQ, DIVIDER) for internal generation;
configure_out_port_strobed_master(data8, cs, clk, 0);
configure_out_port(data4, clk, 0);
start_clock(clk);
...
data8 <: 0;
sync(data8); // get it in sync
...
// use it in you loops making sure X (lower bits 4 bytes) and Y (higher bits 4 nibbles) are the correct values and width, remember each X and Y will actually be 4 different block values in sequence (bytes/nibbles)
data8 <: x;
data4 <: Y;
..


Somebody (@segher) check my code/logic here I don't have anything I can test this on handy and just pulled this from my backside.

P.S. your sine tables will have to be compressed into the shifted byte/nibble sequences for optimum performance I would guess,unless you do the shifting at the output statement perhaps.

regards
Al
User avatar
rp181
Respected Member
Posts: 395
Joined: Tue May 18, 2010 12:25 am

Post by rp181 »

Thanks for the comprehensive reply! I'll try it out later today and report back. For variable frequencies, i think I might just have to do bitshifting at the output, I don't think it will be too costly in terms of speed.

I may be able to restrict the frequencies to have an "easy" least common denominator, so then it'll just be a matter of checking the modulo for each channel.
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

Folknology wrote:Somebody (@segher) check my code/logic here I don't have anything I can test this on handy and just pulled this from my backside.
I'm not testing it on hardware either, heh... cs probably needs to
be inverted, and the sync sequence looks off; other than that it
seems okay to me. Oh, and you forgot chanSelect?
P.S. your sine tables will have to be compressed into the shifted byte/nibble sequences for optimum performance I would guess,unless you do the shifting at the output statement perhaps.
Hrm, you need to output the data for four DAC channels at once,
so you cannot do it in one load instruction. You'll end up with
four loads + three maccus + one out; not cheaper than four loads
+ four outs :-(
User avatar
Folknology
XCore Legend
Posts: 1274
Joined: Thu Dec 10, 2009 10:20 pm

Post by Folknology »

segher wrote:
Folknology wrote:Somebody (@segher) check my code/logic here I don't have anything I can test this on handy and just pulled this from my backside.
I'm not testing it on hardware either, heh... cs probably needs to
be inverted, and the sync sequence looks off; other than that it
seems okay to me. Oh, and you forgot chanSelect?
Well I don't know what width port chanSelect is, it could work with say:

Code: Select all

out buffered port:16 chanSel = XS1_PORT_4E;
or part of:

Code: Select all

out buffered port:32 chanSel = XS1_PORT_8D;
Attach it to the clock:

Code: Select all

configure_out_port(chanSel, clk, 0);
And then in the sine loop, something like this:

Code: Select all

 
int lwr4Chans = 8 | (9 << 4) | (10 << 8) | (11 << 12);
int upr4Chans = 12  | (13 << 4) | (14 << 8) | (15 << 12);
...
for (l = 0, u = INCREMENT; l < SINE_TABLE_LENGTH; l += 2*INCREMENT, u += 2*INCREMENT) {
  chanSel <: lwr4Chans;
  data8 <: x[l];
  data4 <: Y[l];
  chanSel <: upr4Chans;
  data8 <: x[u];
  data4 <: Y[u];
}
segher wrote:
Folknology wrote:P.S. your sine tables will have to be compressed into the shifted byte/nibble sequences for optimum performance I would guess,unless you do the shifting at the output statement perhaps.
Hrm, you need to output the data for four DAC channels at once,
so you cannot do it in one load instruction. You'll end up with
four loads + three maccus + one out; not cheaper than four loads
+ four outs :-(
Yup, what was I thinking ;-)

regards
Al