Which Program Flow?

Post by **Folknology** » Tue Mar 29, 2011 9:48 pm

Good news, so what are you getting frequency wise now with the replicator and 4 concurrent threads?

regards
Al

rp181 · Post by **rp181** » Tue Mar 29, 2011 10:43 pm

Getting the same (> 900 kHz) as without a replicator and with a replicator. Without, i just manually wrote out all of the lines, which i would think does the same thing.

Some other questions i thought of:
1) Does optimizing with the compiler have a downside? If not, why isn't the highest level always on?
2) What is a streaming channel's buffer size?

Post by **Folknology** » Tue Mar 29, 2011 11:35 pm

Sorry no my bad, I thought you were using the older sequential code to call ProcessorThread, as I didn't see a new main. The replicator will give the same results as manual par entries its just neater and scalable as you can make the number of ProcessorThreads variable very simply .

I will leave the Optimisation question for Xmos but I would imagine a compile takes longer of course with higher optimisations.
Streaming channels are limited resources, underneath is a credit system for parts of those resources so I'm not sure if that has an easy answer, it may depend on the number of streaming channels in operation, but Xmos will prob answer that better.

Glad your getting good results

By the way a macro would just replace the function call, or you could do the calculation directly to loose the function overhead:

Code: Select all

for (counter1 = 0; counter1 < 14; counter1+=2) {
         ret = (1000000 * ( currentData[counter1]- currentData[counter1+1]) / 1000000 * (currentData[counter1] + currentData[counter1+1]))
      }

regards
Al

rp181 · Post by **rp181** » Tue Mar 29, 2011 11:55 pm

I didn't realize function calls were so bad! I made it into a macro, and removed extraneous timer statements, and it is now 3 MHz!

Code: Select all

void ProcessorThread(streaming chanend usbOut, int quadrant) {
	short counter1;
	short ret;
	short a;
	short b;

	timer t;
	unsigned long time;
	unsigned long time5;


	short cycles;
t	:> time;
	for (cycles = 0; cycles < 10; cycles++) {
		for (counter1 = 0; counter1 < 14; counter1+=2) {
			a = ((readADC(counter1)+readADC(counter1)+readADC(counter1)+readADC(counter1))/4);
			b = ((readADC(counter1+1)+readADC(counter1+1)+readADC(counter1+1)+readADC(counter1+1))/4);
			//ret = processPair(readNormalizedADC(counter1),readNormalizedADC(counter1+1));
			ret = (1000000 * (a - b) / 1000000 * (a + b));
		}
	}
	t :> time5;

	printf("Time Quadrant %i: %ld\n",(int)quadrant,((time5-time)));
}

This makes me think I am doing something wrong... :shock:

If this is correct, I may, depending on how fast the channel is to the USB thread, have to slow it down to give the ADC time to refresh...

EDIT: With a streaming channel to a USB thread, it is 1.6 MHz.

Post by **Folknology** » Wed Mar 30, 2011 1:24 am

Also as you get nearer to the real thing you will loose the readNormalizedADC() overheads as you will run the ProcessorThreads on response to inputs from a select which would be a case from the ADC input port. This can all effectively be inlined by converting it to a select function which will likely whistle through it faster than the ADC can supply with data. But you need not worry about adding delays etc as it will become event driven and thus the threads will pause whilst waiting for data (this is a good thing). Actually this could leave some thread capacity for your actuation thread/s.

regards
Al

segher · Post by **segher** » Wed Mar 30, 2011 5:20 am

return (1000000 * (a - b) / 1000000 * (a + b));

That's not doing what you want, you multiply by a+b instead of dividing by it.
The division by a constant will be optimised to a multiply by the compiler (and
completely optimised away in this case). You shouldn't divide by the scale
factor anyway. The code you want is:

return 1000000 * (a-b) / (a+b);

(it might help a little if you used a power of two instead of the 1000000, fwiw).

rp181 · Post by **rp181** » Wed Mar 30, 2011 5:35 am

So will the intermediary value still retain the decimal points? I thought i had to multiply both first, as the processor isn't floating point. I wouldn't have realized using a 2's complement would benefit, but i will use 1048576 (2^20)

Post by **Folknology** » Wed Mar 30, 2011 8:22 am

Intent would be clearer with:

ret = 0x100000u * (a-b) / (a+b);

regards
Al

Interactive_Matter · Post by **Interactive_Matter** » Wed Mar 30, 2011 9:00 am

rp181 wrote:I didn't realize function calls were so bad! I made it into a macro, and removed extraneous timer statements

Completely unrelated question but still on topi:

Is a function call that bad?
Even if I say it is 'static inline'?
Or is a makro the only real guarantee for inlining?

Thanks

Marcus

segher · Post by **segher** » Wed Mar 30, 2011 9:32 am

Folknology wrote:Intent would be clearer with:

ret = 0x100000u * (a-b) / (a+b);

That doesn't work. The division has to be signed; making the constant unsigned like this
makes the multiplication unsigned (which is fine), and then the division unsigned as well
(which is not fine).

Don't use the U.

Which Program Flow?

Re: Which Program Flow?

Re: Which Program Flow?

Re: Which Program Flow?

Re: Which Program Flow?

Re: Which Program Flow?

Re: Which Program Flow?

Re: Which Program Flow?

Re: Which Program Flow?

Re: Which Program Flow?

Re: Which Program Flow?