Optimization level issues

Technical questions regarding the XTC tools and programming with XMOS.
Gothmag
XCore Addict
Posts: 129
Joined: Wed May 11, 2016 3:50 pm

Optimization level issues

Post by Gothmag »

I'm using an explorerkit, xTIMEcomposer 14.2.1 build 15182 and an ARM MCU(a PJRC Teensy 3.2 specifically) to do datalogging. Using the standard 3.1.6 I2C library and some simple code within the callbacks the ARM MCU fails to communicate with ANY optimization level other than 0. Is the XCC Command-Line Manual the only place to find specific options for the compiler switches or am I missing something? There only seem to be a couple of options listed as specifically linked to certain -O levels, like funroll-loops and finline-functions. Is there any way for me to find out which optimizations are breaking this? I would try removing things to find out where the issue is but at some point if a function call or some necessary piece of code is breaking it I'm not sure I could make it work anyway since all the threads need data from each other. The code is also fairly simple, most of the callbacks for the I2C library run into a quick switch and break. If there is anything I can try to get some more code optimization from the compiler I'd be happy to try it. If the code for the callbacks is needed I'd be happy to include it, but as I said it's a fresh install of the I2C library.


User avatar
infiniteimprobability
XCore Legend
Posts: 1126
Joined: Thu May 27, 2010 10:08 am

Post by infiniteimprobability »

HI - You're using recent versions of tools and libraries so it should work as advertised. Going down the route of separating out compiler optimisations versus level may be interesting but it feels like there may be an application issue which needs identifying and addressing.

The optimisation level shouldn't affect the functionality but it will definitely affect timing. We actually regress the i2c library (and most others) at optimisation level -O2 so, if anything, we can be more sure of it's operation with optimisation turned up.. My concern is that switching off optimisation is just taking you the other side of marginal issue.

What is the nature of the failure? Have you had a look on the wire to see what's happening?
Gothmag
XCore Addict
Posts: 129
Joined: Wed May 11, 2016 3:50 pm

Post by Gothmag »

I do remember reading that libraries were tested at -O2, it just seemed odd that the same program layout functioned properly on all -O levels on the startkit based version of the project, but I am using older tools, and a different version xmos chip there. Of course there are other differences but not in this part of the program. The clock line just stays low after some time, unfortunately I don't have a logic analyzer to check for sure where. I'll modify the code on the other board to see if I can pinpoint where it fails.
Gothmag
XCore Addict
Posts: 129
Joined: Wed May 11, 2016 3:50 pm

Post by Gothmag »

After some more testing it seems there is about a 500 second delay between sending data over I2C to call an interface function and receiving the ack/nack from the I2C. I tried adding some print statements in the xmos program but unfortunately just like with -O0 there was very little delay, and everything seemed to be working alright from the point of view of the I2C bus. I had tried attaching to the process and I found it in the master_sent_data() call back but that's not really all that helpful since it's just about the only place it could get stuck, and tells me nothing about why. Tried setting fast mode and high priority for the I2C thread, the callback thread, and the thread on the other end of the interface which should get the data and none of it changed anything. I'm letting it run so I can post the output from the I2C master side of the connection(the xmos is slave) will edit when I have something. If you've got any ideas at all I'd appreciate it.


Some output from -O3 xmos program, when 0 is returned from I2C functions it means it didn't detect any issues, which mostly means nacks since there is no timeout:

Code: Select all

Reading PID values...
PID values read.
Writing steering P start: 0
Writing steering P data[0]: 0
Writing steering P data[1]: 0
Writing steering P data[2]: 0
Writing steering P data[3]: 0
Writing steering I start: 0
Writing steering I data[0]: 0
Writing steering I data[1]: 0
Writing steering I data[2]: 0
Writing steering I data[3]: 0
Writing steering D start: 0
Writing steering D data[0]: 0
Writing steering D data[1]: 0
Writing steering D data[2]: 0
Writing steering D data[3]: 0
Writing Set_steering: 0
Took 1200606 milliseconds.
Writing motor P start: 0
Writing motor P data[0]: 0
Writing motor P data[1]: 0
Writing motor P data[2]: 0
Writing motor P data[3]: 0
Writing motor I start: 0
Writing motor I data[0]: 0
Writing motor I data[1]: 0
Writing motor I data[2]: 0
Writing motor I data[3]: 0
Writing motor D start: 0
Writing motor D data[0]: 0
Writing motor D data[1]: 0
Writing motor D data[2]: 0
Writing motor D data[3]: 0
Writing Set_motor: 0
Wrote PID values
Starting loop.
Performed 556 loops in last 2003 milliseconds.
OK: 556, Not OK: 0
Performed 566 loops in last 2003 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2002 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2003 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2002 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2002 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2002 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2002 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2002 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2003 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2002 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2003 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2002 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2002 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2003 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2002 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2002 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2003 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2002 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2003 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2002 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2002 milliseconds.
OK: 566, Not OK: 0
Performed 566 loops in last 2002 milliseconds.
OK: 566, Not OK: 0
Gothmag
XCore Addict
Posts: 129
Joined: Wed May 11, 2016 3:50 pm

Post by Gothmag »

I don't know if this should be a new thread... I've found the time to communicate from the callback thread to the thread with the data is just excessive in the hundreds of seconds. I moved the PID data transfer to another thread which is just to serve data between the 2 tiles/threads. Now it can only manage an update once a second. This is also seems to be regardless of the -O level. Should I create another interface so I can at least manage to move data the other way around or is there any sort of known fix? Here is where the main threads are. Datalogger(tile 0, core 1) is the I2C callback thread and path_follower(tile 1, core 0) is where all the data is located and needs to be pulled from.

Code: Select all

par {
        //ESC PWM
        on tile[0].core[0]: read_pwm(motor_input, motor, path_following[0]);
        on tile[0].core[0]: write_pwm(motor_output, motor);

        //Steering servo PWM
        on tile[0].core[0]: read_pwm(steering_input, steering, path_following[1]);
        on tile[0].core[0]: write_pwm(steering_output, steering);

        on tile[0].core[0]: read_toggle(red_toggle, path_following[2]);

        //Datalogging threads
        on tile[0].core[1]: Datalogger(rpi_slave, datalogger, datalogger_pathdata,
                imu[3]);
        on tile[0]: i2c_slave(rpi_slave, rpi_scl, rpi_sda, 0x11);

        //IMU server and I2C master
        on tile[0].core[3]: IMU(imu_i2c[0], imu);
        on tile[0]: i2c_master(imu_i2c, 1, imu_scl, imu_sda, 400);

        //Button handler and led handler
        on tile[0].core[4]: Buttons_handler(buttons_port, buttons/*, rpi_shutdown, rpi_restart*/);
        on tile[0].core[4]: LED_handler(leds_port, leds);
        //on tile[0].core[5]: PI_data_lines(pi_update_line, pi_ready_line, pi_data_interface);

        //Path Following threads
        on tile[1].core[4]: pathing_data(path_data, datalogger_pathdata);
        on tile[1].core[0]: path_follower(path_following, imu[2], wheels, datalogger, path_data,
                leds, buttons);

        //Wheel speed threads and higher level handler
        on tile[1].core[1]: wheel_speed(left_wheel_a, left_wheel_b, leftwheel, imu[0]);
        on tile[1].core[2]: wheel_speed(right_wheel_a, right_wheel_b, rightwheel, imu[1]);
        on tile[1].core[3]: wheels_handler(leftwheel, rightwheel, wheels);

    }
This is the output from the ARM mcu after using the pathing_data thread to store PID data for retrieval by path_follower thread later.

Code: Select all

SD ok
Text path NOT converted
Reading PID values...
PID values read.
Writing steering P start: 0
Writing steering P data: 0
Writing steering I start: 0
Writing steering I data: 0
Writing steering D start: 0
Writing steering D data: 0
Writing Set_steering: 0
Took 68 microseconds
Writing motor P start: 0
Writing motor P data: 0
Writing motor I start: 0
Writing motor I data: 0
Writing motor D start: 0
Writing motor D data: 0
Writing Set_motor: 0
Took 66 microseconds
Sent PID data.
Starting loop.
Performed 1 loops in last 164489111 microseconds.
OK: 1, Not OK: 0
Performed 1 loops in last 466391581 microseconds.
OK: 1, Not OK: 0
Performed 2 loops in last 326485875 microseconds.
OK: 2, Not OK: 0
User avatar
infiniteimprobability
XCore Legend
Posts: 1126
Joined: Thu May 27, 2010 10:08 am

Post by infiniteimprobability »

I've found the time to communicate from the callback thread to the thread with the data is just excessive in the hundreds of seconds.
Hmm - there is something seriously wrong if something takes hundreds of seconds. Normally, the only thing that can delay serious time (2^31/100e6 = 21.47 seconds) is a timerafter() construct...or external I/O..

I notice that the I2C callbacks are handled by the Datalogger() task. This means interface call(back)s will be made on I2C events. The way the interfaces work is synchronous - that means the callback made in i2c_slave will block i2c_slave until the callback case has completed all of the way to the break. This means any interface calls or anything blocking in the callback in Datalogger select handler will also cause 'backpressure' i.e. block ic_slave until they are all complete.

My guess is that something in the Datalogger task is blocking, so the end of the select case: is not reached for a long time.

I'm not sure how familiar you are with the mechanism behind the whole tile[n].core[m] syntax? The compiler effectively takes all select cases from tasks on tile[n].core[m] and puts them in one big select in a single core. Great for using less cores, but it means that you may have to wait for a lot of other cases to complete before a particular case is serviced (no preemption). Removing the .core[m] bit will instruct the compiler to use a whole core for that task - maybe experiment to see which one is the blockage..

To debug this, you could use xscope printing, keep the print short like

Code: Select all

printcharln('a');
with timing to see where you hit states at what time. In config.xscope you need:

Code: Select all

<xSCOPEconfig ioMode="timed" enabled="true">
Hopefully you can track down the offending code!!
Gothmag
XCore Addict
Posts: 129
Joined: Wed May 11, 2016 3:50 pm

Post by Gothmag »

I do believe I understand the tile and core syntax. the program is being built as I.go.and the placements so far reflect that, with most threads getting their own core, but reduced to use 5 per tile. I'm also lazy so I decided to run all the datalogging data through another thread and all the I2C now works at full speed. Now the problem is that the data never changes.

Ive tried 2 ways to get the data, passing the data thread the memory address from work thread and passing a copy of the data from work thread everytime data is updated and neither works. If i put prints anywhere the data moves by value, reference or pointer it works. is it possible the data changes are being optimized out when its not specifically being used? EDIT: I tried adding a CRC and a check on that but nothing changed so the answer is likely no.

Here is another question, what exactly is different in the actual run between using debug and flashing? Use the same build the debug run works normally and what looks to be about the same speed, as seen from the other MCUs output, but the flashed run never records data changing. Is it simply the difference of booting from flash and booting from jtag? My debug options are run on hardware, xtag-3 target and JTAG I/O server, but I've currently removed all prints from the program. Nothing else is enabled or selected. I'd really love to figure this out as it's the only issue I currently seem to have but it completely breaks the project.
Gothmag
XCore Addict
Posts: 129
Joined: Wed May 11, 2016 3:50 pm

Post by Gothmag »

I thought I'd add another update. Since timing was a potential issue, but I've had plenty of other problems with this project I thought I'd kill 2 birds with one stone. I added a fast streaming uart tx interface and I'm moving it around between threads to compare against I2C data I receive on the ARM mcu. First thing, they(the uart data and i2c data) always match accounting for time. The first, and only so far, thing I decided to check is the heading from the IMU. It's critical that I have that data so I started there. When I flash the program I never see the heading change, I've tried moving things around cores, changing the master I2C rate from 5kbit/s to 400kbit/s and I only ever read 0*. When I run as debug I see the heading change appropriately, and when pointing north it reads within a couple of degrees of 0.00.

After that I figured I'd go higher up the chain, so I modified i2c master read_reg function so that it could send the individual bytes it read over uart. Same thing, nothing happens when flashing, but debugging is ok. Up again, modified the i2c_master read(...) select to take a streaming chanend. Immediately following the loop that read the byte I sent the byte out before it moved it over to the array. Still nothing, never changed when flashing, worked ok in debug. I guess next I'll modify high_pulse_sample the same way to see the individual bits it reads, or I'll hook up another microcontroller on the bus and start reading the bits that way and try to verify, or not, the data. I'm also going to try a different version of the tools.

Unless I figure it out tomorrow I will probably swap out the explorerkit for another ARM processor but I'll keep trying to figure this out either way.
henk
Respected Member
Posts: 347
Joined: Wed Jan 27, 2016 5:21 pm

Post by henk »

Hi Gothmag,
I'm also lazy so I decided to run all the datalogging data through another thread and all the I2C now works at full speed. Now the problem is that the data never changes.
Quick thought about the data never changing; one way this could happen is if the producer and consumer run on different tiles and use shared memory? (as tiles do not share memory).

Cheers,
Henk
Gothmag
XCore Addict
Posts: 129
Joined: Wed May 11, 2016 3:50 pm

Post by Gothmag »

I only use shared memory as a lazy last ditch effort. Typically I seem to have plenty of time to pass by value, or for functions for single threads pass by reference, but I usually declare them inline. The data thread does run on the same tile as the producer, but it also assigns it to a local copy when it's received. When the consumer asks for it the data threads local copy is passed to it, across threads, by value. I'll be trying a couple different things today but It's probably the last day I have I can spend trying to debug so If I don't feel like I'm making any headway I make give up early to start modifying the hardware and writing the software. Only have 3 weeks and compared to most of the guys I'd probably be seen as months behind.