Is XC a realistic choice for real world applications?

Technical questions regarding the XTC tools and programming with XMOS.
User avatar
Folknology
XCore Legend
Posts: 1274
Joined: Thu Dec 10, 2009 10:20 pm

Post by Folknology »

Hi Nick

I believe that LwIP offers BSD socket interface as an option, I don't think this is the case with uIP. As to how much overhead this adds I am not entirely sure, we haven't even worked out if we can fit LwIP on a L1 yet hence my questions.

regards
Al


User avatar
daveg
Member++
Posts: 28
Joined: Thu Dec 10, 2009 7:25 pm

Post by daveg »

I believe that earlier experiments involved lwIP, but it was too large to be of much use in the limited memory on a single core. This was the main reason that uIP was used instead (IIRC).
User avatar
Interactive_Matter
XCore Addict
Posts: 216
Joined: Wed Feb 10, 2010 10:26 am

Post by Interactive_Matter »

Just yesterday I read a very interesting blog post I want to share with you:

http://bartoszmilewski.wordpress.com/20 ... ogramming/

Some of the quotations reminded me much of XC:
Threads? Who Needs Threads?


In the past, to explore parallelism, you had to create your own thread pools, assign tasks to them, balance the loads, etc., by hand. Having this capacity built into the language (or a library) takes your mind off the gory details. You gain productivity not by manipulating threads but by identifying the potential for parallelism and letting the compiler and the runtime take care of the details.

In task-driven parallelism, the programmer is free to pick arbitrary granularity for potential parallelizability, rather than being forced into the large-grain of system threads. As always, one more degree of indirection solves the problem. The runtime may chose to multiplex tasks between threads and implement work stealing queues, like it’s done in Haskell, TPL, or TBB. Programming with tasks rather than threads is also less obtrusive, especially if it has direct support in the language.


Shared Memory or Message Passing?

We think linearly and we write (and read) programs linearly–line by line. The more non-linear the program gets, the harder it is to design, code, and maintain it (gotos are notorious for delinearizing programs). We can pretty easily deal with some of the simpler MP protocols–like the synchronous one. You send a message and wait for the response. In fact object oriented programming is based on such a protocol–in the Smalltalk parlance you don’t “call” a method, you “send a message” to an object. Things get a little harder when you send an asynchronous message, then do some other work, and finally “force” the answer (that’s how futures work); although that’s still manageable. But if you send a message to one thread and set up a handler for the result in another, or have one big receive or select statement at the top to process heterogeneous messages from different sources, you are heading towards the land of spaghetti. If you’re not careful, your program turns into a collection of handlers that keep firing at random times. You are no longer controlling the flow of execution; it’s the flow of messages that’s controlling you. Again, programmer productivity suffers. (Some research shows that the total effort to write an MPI application is significantly higher than that required to write a shared-memory version of it.)
Most of the complains (that manually controlling threads is not what you want to do and that if you pass enough messages between modules you will never ever understand the code again reminded me of this discussion.

And I think that perhaps XC is jsut a language for low level implementation for very specific details of an bigger library - and by thus you build abstractions to keep the parallelism and message passing down there, to build an abstraction to still understand the code?!
User avatar
jonathan
Respected Member
Posts: 377
Joined: Thu Dec 10, 2009 6:07 pm

Post by jonathan »

Those arguments are extremely vague. They seem to apply largely to some stereotypical generalisation of what "Threads" and "Message-Passing" actually are. (Anyone who thinks MPI is a good example of message-passing, ouch).

The idea that events control the flow of execution through your program is... intuitive. As long as you have good memory protection and understand how to program using this style.

The statement:
We think linearly and we write (and read) programs linearly–line by line.
Is nonsense, in my opinion. This is probably why the author retains his somewhat religious viewpoint on the issue. I suspect (I have no evidence, so it's just my opinion) most programmers actually jump around their code whilst writing and reading, sketching in the outlines of methods before coming back and adding the method bodies.

Moreover, humans do not think linearly. Not even the most boring humans think linearly. If we did think linearly, we'd all have been eaten by tigers (or whatever...) millions of years ago. We constantly accept input from a wide variety of sources, process it and prioritise it and act upon it.
Image
User avatar
Jamie
Experienced Member
Posts: 99
Joined: Mon Dec 14, 2009 1:01 pm

Post by Jamie »

Moreover, humans do not think linearly. Not even the most boring humans think linearly. If we did think linearly, we'd all have been eaten by tigers (or whatever...) millions of years ago. We constantly accept input from a wide variety of sources, process it and prioritise it and act upon it.
Hear hear!
User avatar
Folknology
XCore Legend
Posts: 1274
Joined: Thu Dec 10, 2009 10:20 pm

Post by Folknology »

I agree that the author has perhaps assigned himself to linear thinking and has thus leaned to the STM side of the concurrency divide. But as others have said here we don't think linearly, Its our tools and education that often train us to do so.Such education and tools are damaging in the long term as they hide better ways of tackling problems. Its also worth noting that most electronic engineers do not just think linearly, imagine having to design electronics with the linear handcuffs!

There are also practical problems with an STM approach in something like an embedded system as the memory hardware would need to be faster than the processing architecture in order to keep up. This is possible in large distributed programming as the performance requirements of a software transactional layer (as opposed to a hardware implementation) is bearable.

Also at the embedded (I hate that word) level resources are severely constrained, particularly memory this can have a major effect on your design approach. what the author would not have considered (because it was about distributed processsing) is that functions themselves can sometime have too greater overhead to solve critical timing parts of problems in this space. This was brought home to me recently in a conversation with segher about how to deal with the missing "select" abstraction in C/C++ that one finds in XC. My initial selection was event dispatch using event registration using function pointer passing, although this is a nice pattern for the user its bad for the hardware. The idea of adding stack loading and unloading with hardware events defeats the benefits of the super efficient hardware event handling of the XS1. XC select abstraction not only allows for event handling, it also produces remarkable fast event dispatch and handling. fast enough for example to handle things like MII hardware interfacing ;-)

Which brings me nicely back to this threads point. Although XC clearly has advantages in its approach and abstractions like select, the other constraints of the XS1 implementation (Lx/Gx) like memory force us engineers and designers into having to fall back on the more ugly linear hacks of the past like shared memory, with concurrency overhead code in the form of semaphores, locks and manual synchronisations. Worst still is the fact that it once again opens that dark pit of dangerous hidden concurrency errors. That's really my point with this thread in an ideal world we wouldn't have these issues we could use XC all the way without having to drop down into C or ASM, thats to use a horrible phrase, the Elephant in the room.
User avatar
jonathan
Respected Member
Posts: 377
Joined: Thu Dec 10, 2009 6:07 pm

Post by jonathan »

Al - your post highlights why, yet again, we need to carefully separate issues.

This thread begins with a discussion of XC, and yet your primary issue is with (it appears) the memory limitations of the specific implementation of the L1 architecture. This is not (necessarily) an issue with XC as a language; more an issue with the specific realisation of the combination of tools and hardware.

There are language-level issues in XC (for example, protocols or typed-channels or sorts) that would enable the programmer to catch quite a number of difficult bugs at compile-time. Moreover, providing (in some form) a more "typed" interface on channels enables far simpler compositional behaviour of threads. These are both language-design level issues.

I would want to see specific examples of memory-intensive computing that would be better solved (ie without a substantial performance degradation) by the addition of more memory to the XS1 architecture. There is no doubt that it is relatively memory-constrained, but often a form of double-buffering (or pipelined-buffering) with a number of threads operating on different portions of the processed data can resolve "single-large-array" or "sharing" patterns, without breaking the disjointness rules of XC.

I suspect there are some other language features missing from XC; I'm not sure. Some of the complaints I have heard relate to packet processing and stripping which people "typically" do with pointers; there may be interesting new language features that could be added in this case, I guess...

Any ideas for language changes? Architecture changes? And specific examples, beyond the ideas in this post?
Image
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

jonathan wrote:I would want to see specific examples of memory-intensive computing that would be better solved (ie without a substantial performance degradation) by the addition of more memory to the XS1 architecture. There is no doubt that it is relatively memory-constrained,
64kB is not constrained. 4kB is constrained.

More memory is of course welcome, but it only starts to become really useful to have more
when you have a few MB. Which we of course cannot have as fast SRAM.
but often a form of double-buffering (or pipelined-buffering) with a number of threads operating on different portions of the processed data can resolve "single-large-array" or "sharing" patterns, without breaking the disjointness rules of XC.
One thing that XC really needs is a way to take array slices. The ABI can handle it just fine,
it just needs some language syntax.
Any ideas for language changes? Architecture changes? And specific examples, beyond the ideas in this post?
I want forms of lss, lsu, ld8u etc. that take an immediate operand :-) A bit off-topic for this
thread, I suppose.


Segher
User avatar
Interactive_Matter
XCore Addict
Posts: 216
Joined: Wed Feb 10, 2010 10:26 am

Post by Interactive_Matter »

Whow that was a fast and unexpected reaction. Thanks for that.

I personally think that thinking in parallel structures is much harder than thinking in linear program code. My personal experience is, that the more the code is spread over different modules the harder it is to understand. On the other hand, havin only long spagetthi code routines with heaps of duplicated code is much more complicated.
jonathan wrote: The statement:
We think linearly and we write (and read) programs linearly–line by line.
Is nonsense, in my opinion. This is probably why the author retains his somewhat religious viewpoint on the issue. I suspect (I have no evidence, so it's just my opinion) most programmers actually jump around their code whilst writing and reading, sketching in the outlines of methods before coming back and adding the method bodies.
You are true. But from my point of view it is valid as long as the methods are logical abstractions for complex tasks (instead of controlling the bytes send over a network channel I simply perform a HTTP Request).
I have seen code which was modularized for the sake of modularization itself and from my point of view that was nothing nice. If there are too much indirection and abstraction it gets hard to understand.

So finding the right abstraction does help a lot. Too much or too less gets cumbersome.
jonathan wrote: The idea that events control the flow of execution through your program is... intuitive. As long as you have good memory protection and understand how to program using this style.
My experience with UI programming - which is heavy event driven - is that if too much magic reactions on events it get's quite cumbersome to understand.

I persoanlly think that the more abstraction you have from the details. e.g. of parallelism and message passing the easier it is to program. On the other hand you loose more control by this and there are use cases where you need that control.
User avatar
Folknology
XCore Legend
Posts: 1274
Joined: Thu Dec 10, 2009 10:20 pm

Post by Folknology »

I would want to see specific examples of memory-intensive computing that would be better solved (ie without a substantial performance degradation) by the addition of more memory to the XS1 architecture. There is no doubt that it is relatively memory-constrained, but often a form of double-buffering (or pipelined-buffering) with a number of threads operating on different portions of the processed data can resolve "single-large-array" or "sharing" patterns, without breaking the disjointness rules of XC.

I suspect there are some other language features missing from XC; I'm not sure. Some of the complaints I have heard relate to packet processing and stripping which people "typically" do with pointers; there may be interesting new language features that could be added in this case, I guess...
Actually the network stack is an excellent example of the type of problem that needs solving. Maybe 'slices' support in XC could go some way to fixing this.