[[single_issue]] and [[dual_issue]]

Technical questions regarding the XTC tools and programming with XMOS.
User avatar
aclassifier
Respected Member
Posts: 483
Joined: Wed Apr 25, 2012 8:52 pm
Contact:

[[single_issue]] and [[dual_issue]]

Post by aclassifier »

I miss a high level description of [[single_issue]] and [[dual_issue]], understandable for a startKIT soon going to xCORE-200 user.

I have tried to say something about it here [1], where I also mention David May's patent from 2007 [2]. I first saw it in the 14.3.0 release note [3].

I certainly need someone to fill in the gap, as I am curious to learn.

[1] http://www.teigfam.net/oyvind/home/tech ... uble_issue
[2] Compact instruction set encoding, patent by David May, XMOS (2007), see https://patents.google.com/patent/US767 ... ignee=xmos
[3] https://www.xmos.com/download/private/T ... 4.3.0).txt


--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
Bianco
XCore Expert
Posts: 754
Joined: Thu Dec 10, 2009 6:56 pm
Contact:

Post by Bianco »

In dual issue mode, the processor can execute two instructions (real) concurrently, potentially doubling your performance.
In practice not all instruction pairs can be executed concurrently, it has to adhere to some rules such as: both instructions must be 16-bit encoded, there should not be a dependency between the two instructions, the two instructions should not use the same functional unit. If a 16-bit encoded instruction cannot be paired with the next instruction, the second issue slot should be padded with a NOP instruction.

The xCORE-200 architecture document provides the details of instruction pairing.

The [[single_issue]] and [[dual_issue]] clauses are just to instruct (force) the compiler to use single issue or dual issue mode. In this way you can override the default compiler setting for the source file.
User avatar
aclassifier
Respected Member
Posts: 483
Joined: Wed Apr 25, 2012 8:52 pm
Contact:

Post by aclassifier »

Thank you, Bianco!

I assume that "the two instructions should not use the same functional unit" means "not the same slice"? Because I assume the the instructions for the logical cores won't really run concurrently, since the cycles are spread over the logical cores to enable determinism?

Also, what kind of function may I write that would, at the instruction level, be able to have any sensible meaning of concurrency? A "par" may only be placed at the highest level in XC. Or would I be able to implement a parallel algorithm in a function (like a parallel FFT with shared data). If so, then I assume [[single_issue]] would destroy that parallelism just like that. See, I mess this up! Since I don't understand the problem the solution is technicality.

I did study the CORE-200 architecture document (1) some before I asked, but I didn't get the bird's eye view. It gave me how but not why.

I assume this is fantastically elegant and I'd like a share of it!

(1) https://www.xmos.com/download/private/x ... 1.1%29.pdf
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
andrew
Experienced Member
Posts: 114
Joined: Fri Dec 11, 2009 10:22 am

Post by andrew »

The two issue lanes are broadly: one for load/store instructions and the other for resource instructions. Both lanes can do arithmetic. This means that two short encoded arithmetic instruction can be issued in parallel, i.e. add.
User avatar
aclassifier
Respected Member
Posts: 483
Joined: Wed Apr 25, 2012 8:52 pm
Contact:

Post by aclassifier »

So it's a micro-parallelism per thread and logical thread, en par with optimalisation parameters like -o2, where the result is out of reach. We just know it would run faster? I guess that also explains why the 14.3.0 release note said "Dual issue must be explicitly request using '-mdual-issue'. It is no longer enabled by default for -O2 or -O3 builds. Individual functions may be forced using the [[single_issue]] or [[dual_issue]] attribute."

But this would be another kind of optimalsiation, based on the concurrent properites of some 16 bits instructions running on top of exactly this HW that has this kind of parallelism. I guess this also describes why there is no top level description of it (yet)?

But for a user adding [[dual_issue]] above a function, what's the basic difference between this and adding something like [[-o2]] also as a tag per function? Why would I want higher granularity of the first but not the second?
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
andrew
Experienced Member
Posts: 114
Joined: Fri Dec 11, 2009 10:22 am

Post by andrew »

There is a 5 stage pipeline down which the instruction are executed. This pipeline is shared between the 1 - 8 cores, meaning that they don't exactly execute in parallel. However, the ALU can execute a cores instruction in parallel when the instruction is a dual issue one, i.e. two adds. Does that help?
User avatar
aclassifier
Respected Member
Posts: 483
Joined: Wed Apr 25, 2012 8:52 pm
Contact:

Post by aclassifier »

Absolutely! Stone on stone now!

But I still miss the reflection I'm hoping for. Like the not present [[-o2]] but present [[dual_issue]] and [[single_issue]]. Also, are they more than optimalisation?

But since I also like the details: Will the compiler make it such that "parallizable" (or dual issue'able) 16-bit instructions (with operands(?)) are aligned so that they will be showeled to two cores? Will the compiler (or the scheduler?) inspect the train of instructions and provided it keeps the semantics, perhaps swap two just to make them align with another for another core? Because if it only does this by coincidence then there has to be a lot of inherent concurrency to take advantage of it. Or, can one say that between two given cores the inherent concurrency is 100% except for synchronisation points? Meaning there's always some concurrency to exploit?
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
akp
XCore Expert
Posts: 578
Joined: Thu Nov 26, 2015 11:47 pm

Post by akp »

I may be misunderstanding your question, but my guess is that in dual issue mode the ALU executes two instructions from the same core simultaneously, they aren't from different logical cores. But I am quite possibly wrong.
User avatar
andrew
Experienced Member
Posts: 114
Joined: Fri Dec 11, 2009 10:22 am

Post by andrew »

I dual-issue mode the assembler may explicitly issue instructions in dual-issue mode. The assembler can be invoked with -fschedule (i think) that will do a better job of scheduling the instructions to take advantage of the instruction level parallelism. If you want the best performance, then core kernels are usually hand written to achieve very high instruction throughput. For example, the FFT code is very good(it's in lib_dsp)
User avatar
aclassifier
Respected Member
Posts: 483
Joined: Wed Apr 25, 2012 8:52 pm
Contact:

Post by aclassifier »

akp: We need this described in a top-down manner by XMOS.

andrew: I downloaded lib_dsp and found the FFT. Is it the asm lines you refer to?
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
Post Reply