I got in on the kickstarter as well. We'll see how it turns out.
Since we're talking about other chips (seriously, is that even allowed [grin]), another thing that's caught my eye recently is the Blackfin 60x series (
http://www.analog.com/en/processors-dsp ... oduct.html). These are medium-speed (500MHz) fixed-point DSP chips with dual cores, with each core being able to retire 2 16-bit MACs per clock, assuming you set them up correctly (DMA into level-1 RAM from SDRAM, DMA out of level-1 RAM to SDRAM, work using level-1 RAM). The C/C++ compiler offered by Analog Devices allows you to set things up like this, there's also a gcc port, but I'm not sure what facilities that offers.
They also have Ethernet MAC (x2), CAN, USB, DMA-driven parallel ports that can parse video etc. (x3), UARTS (x2), TWI, SPI, Synchronous serial ports that run at ~80MHz (x3), DDR SDRAM controller, and (to bring it struggling back to being relevant) 4 link ports.
Each link port is a bidirectional DMA-driven 8-bit port with some extra signals to do handshaking. There's no protocol sitting on top of it to handle message routing between nodes, but the hardware is there. The data is still premature, but it looks as though the clock will be able to run up to 250MHz, so each port can do ~250MB/sec.
Oh, and they have a *very* friendly BGA layout - only the outer 3 layers of pins are used for anything other than power/ground - after trying to route out the XS1-G4 I really appreciate that. Cost has been estimated as ~$15/1k quantities, which translates to ~$35-$40 in quantity-1 if you extrapolate from other Analog-Devices 1k figures to what Digikey is charging for quantity-1. That's for the high-end 609 part, there are lesser chips (which still have the link ports but less internal RAM, or no vision-processor) which will presumably be cheaper.
To be honest, these new DSPs (not yet available in quantity, but AD will sell you an eval kit with a working '609 part on it) look pretty tasty - sure you don't get 4 independent 125MHz threads, or 8 independent 67MHz threads, but you do get 2 500MHz threads. If you're running multitasking, it's a step-down in predictability on the timeslicing front, but on the flipside a single thread can run a lot faster. And it has links.
It's also pretty easy to do the "software as silicon" approach, given that every port pin on the chip (that's 112 pins in a 7x16-bit port configuration) can register for interrupts (level or edge), or you may be able to get away with the DMA-driven ports if you can massage the input protocol into something the DMA engines can handle.
I've ordered an eval kit, just to play with one, it ought to be arriving Mon/Tue. It'll be interesting to see how far I can take it... If I can get a basic timeslicing "os" up and running, and then implement a message-routing protocol, then a hypercube of {DSP, DDR RAM} nodes could be pretty intriguing.
Simon