Written any dual-issue code yet?
I manage to write an asm function that calculates the dot product between 2 vectors, using both load double, store double and dual issue.
My guess is that all 16 bits instruction can be dual issue, and that the second ALU only needs the logic for short instructions.
One clock-cycle can decode 32 bits of instruction in the pipeline and the fetch stage can fetch 2x 32 bits of data , or 32 bits of data and 32 bits of code !?
(Only guessing here)
Anyone knowing ?
New instructions
-
- XCore Expert
- Posts: 956
- Joined: Fri Dec 11, 2009 3:53 am
- Location: Sweden, Eskilstuna
-
- XCore Expert
- Posts: 844
- Joined: Sun Jul 11, 2010 1:31 am
Insns using resources only go in the first slot; insns usinglilltroll wrote:My guess is that all 16 bits instruction can be dual issue, and that the second ALU only needs the logic for short instructions.
memory and jumps only go in the second slot. I think.
The decode stage decodes one aligned 32-bit group inOne clock-cycle can decode 32 bits of instruction in the pipeline and the fetch stage can fetch 2x 32 bits of data , or 32 bits of data and 32 bits of code !?
dual-issue mode. I think :-)
Register fetch can do four registers (just like on the old
design); and writeback writes two. There now are two
execute stages.
The memory access stage can do one aligned 64-bit access.
-
- XCore Expert
- Posts: 956
- Joined: Fri Dec 11, 2009 3:53 am
- Location: Sweden, Eskilstuna
Check out this code: This code seems to run 2 different types of memory access without FNOP until it is changed from store to write ?segher wrote: The memory access stage can do one aligned 64-bit access.
How can this be done with only one read slot in the pipeline ?
Why the FNOP in the middle and the end?
Is the instruction buffer longer than 64 bits in xCORE-200?
Shouldn't it be 10 ns instead of 8 ns in dual issue mode ?
You do not have the required permissions to view the files attached to this post.
-
- XCore Expert
- Posts: 844
- Joined: Sun Jul 11, 2010 1:31 am
That is what I see code do, for data accesses; it would belilltroll wrote:segher wrote: The memory access stage can do one aligned 64-bit access.
strange if code fetches could do more. But your log below
suggests fetches can do 128 bits; data accesses cannot
read more than 64 bits anyway, can only write to two regs
in the register files at once.
I'm not sure where you get that. I see it doing one memoryThis code seems to run 2 different types of memory access without FNOP until it is changed from store to write ?
access per cycle, for four cycles; and then a fetch no-op.
Because the instruction buffer was drained.Why the FNOP in the middle and the end?
It pretty much has to be, yes. How big, dunno; and fetchesIs the instruction buffer longer than 64 bits in xCORE-200?
seem to read 128 bits at once. Nice :-)
In single issue mode as well I'd say?Shouldn't it be 10 ns instead of 8 ns in dual issue mode ?
-
- Active Member
- Posts: 62
- Joined: Mon Jun 10, 2013 2:14 pm
lilltroll wrote: Shouldn't it be 10 ns instead of 8 ns in dual issue mode ?
It is 8 ns because SystemFrequency is set to 500 MHz in the XN. Set SystemFrequency to 400 MHz and it will be 10 ns.segher wrote: In single issue mode as well I'd say?
-
- Active Member
- Posts: 44
- Joined: Mon Jul 29, 2013 4:33 am
How does the change in the pipeline length from 4-stage to 5-stage factor into this?ers35 wrote:It is 8 ns because SystemFrequency is set to 500 MHz in the XN. Set SystemFrequency to 400 MHz and it will be 10 ns.
I also would have expected 10ns based on 5-stage pipeline @ 500MHz.
-
- Respected Member
- Posts: 318
- Joined: Tue Dec 15, 2009 12:46 am
A document describing the updated xCORE-200 instruction set is now available, see
xCORE-200: The XMOS XS2 Architecture (ISA)
This includes semantics for the new instructions and a description of the dual issue instruction execution scheme.
xCORE-200: The XMOS XS2 Architecture (ISA)
This includes semantics for the new instructions and a description of the dual issue instruction execution scheme.
-
- XCore Expert
- Posts: 844
- Joined: Sun Jul 11, 2010 1:31 am
Very nice :-)
A few typoes / minor mistakes I remember after first reading:
- "stauration";
- setci etc. seem to have some wrong markup, "exttt";
- one of the last chapters talks about XS1-G4.
All the relative immediate branches (and ldap) use a
multiplier of 2 everywhere in this doc, I was under the
impression that it is 4 in dual issue mode?
A few typoes / minor mistakes I remember after first reading:
- "stauration";
- setci etc. seem to have some wrong markup, "exttt";
- one of the last chapters talks about XS1-G4.
All the relative immediate branches (and ldap) use a
multiplier of 2 everywhere in this doc, I was under the
impression that it is 4 in dual issue mode?
-
- Respected Member
- Posts: 318
- Joined: Tue Dec 15, 2009 12:46 am
Thanks for your feedback, I've passed your comments on.segher wrote:Very nice :-)
A few typoes / minor mistakes I remember after first reading:
- "stauration";
- setci etc. seem to have some wrong markup, "exttt";
- one of the last chapters talks about XS1-G4.
This is a mistake in the document. pc relative immediate operands are all be scaled by the issue width (2 for single issue, 4 for dual issue).All the relative immediate branches (and ldap) use a
multiplier of 2 everywhere in this doc, I was under the
impression that it is 4 in dual issue mode?
-
- Active Member
- Posts: 44
- Joined: Mon Jul 29, 2013 4:33 am
I could really use some diagrams/figures to help get my mind around the architecture document.richard wrote:A document describing the updated xCORE-200 instruction set is now available, see
xCORE-200: The XMOS XS2 Architecture (ISA)
This includes semantics for the new instructions and a description of the dual issue instruction execution scheme.