New instructions

Technical questions regarding the XTC tools and programming with XMOS.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

Written any dual-issue code yet?

I manage to write an asm function that calculates the dot product between 2 vectors, using both load double, store double and dual issue.

My guess is that all 16 bits instruction can be dual issue, and that the second ALU only needs the logic for short instructions.

One clock-cycle can decode 32 bits of instruction in the pipeline and the fetch stage can fetch 2x 32 bits of data , or 32 bits of data and 32 bits of code !?

(Only guessing here)
Anyone knowing ?


User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

lilltroll wrote:My guess is that all 16 bits instruction can be dual issue, and that the second ALU only needs the logic for short instructions.
Insns using resources only go in the first slot; insns using
memory and jumps only go in the second slot. I think.
One clock-cycle can decode 32 bits of instruction in the pipeline and the fetch stage can fetch 2x 32 bits of data , or 32 bits of data and 32 bits of code !?
The decode stage decodes one aligned 32-bit group in
dual-issue mode. I think :-)

Register fetch can do four registers (just like on the old
design); and writeback writes two. There now are two
execute stages.

The memory access stage can do one aligned 64-bit access.
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

segher wrote: The memory access stage can do one aligned 64-bit access.
Check out this code:
inst.gif
This code seems to run 2 different types of memory access without FNOP until it is changed from store to write ?

How can this be done with only one read slot in the pipeline ?

Why the FNOP in the middle and the end?

Is the instruction buffer longer than 64 bits in xCORE-200?

Shouldn't it be 10 ns instead of 8 ns in dual issue mode ?
You do not have the required permissions to view the files attached to this post.
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

lilltroll wrote:
segher wrote: The memory access stage can do one aligned 64-bit access.
That is what I see code do, for data accesses; it would be
strange if code fetches could do more. But your log below
suggests fetches can do 128 bits; data accesses cannot
read more than 64 bits anyway, can only write to two regs
in the register files at once.
This code seems to run 2 different types of memory access without FNOP until it is changed from store to write ?
I'm not sure where you get that. I see it doing one memory
access per cycle, for four cycles; and then a fetch no-op.
Why the FNOP in the middle and the end?
Because the instruction buffer was drained.
Is the instruction buffer longer than 64 bits in xCORE-200?
It pretty much has to be, yes. How big, dunno; and fetches
seem to read 128 bits at once. Nice :-)
Shouldn't it be 10 ns instead of 8 ns in dual issue mode ?
In single issue mode as well I'd say?
User avatar
ers35
Active Member
Posts: 62
Joined: Mon Jun 10, 2013 2:14 pm

Post by ers35 »

lilltroll wrote: Shouldn't it be 10 ns instead of 8 ns in dual issue mode ?
segher wrote: In single issue mode as well I'd say?
It is 8 ns because SystemFrequency is set to 500 MHz in the XN. Set SystemFrequency to 400 MHz and it will be 10 ns.
Hagrid
Active Member
Posts: 44
Joined: Mon Jul 29, 2013 4:33 am

Post by Hagrid »

ers35 wrote:It is 8 ns because SystemFrequency is set to 500 MHz in the XN. Set SystemFrequency to 400 MHz and it will be 10 ns.
How does the change in the pipeline length from 4-stage to 5-stage factor into this?

I also would have expected 10ns based on 5-stage pipeline @ 500MHz.
richard
Respected Member
Posts: 318
Joined: Tue Dec 15, 2009 12:46 am

Post by richard »

A document describing the updated xCORE-200 instruction set is now available, see
xCORE-200: The XMOS XS2 Architecture (ISA)

This includes semantics for the new instructions and a description of the dual issue instruction execution scheme.
User avatar
segher
XCore Expert
Posts: 844
Joined: Sun Jul 11, 2010 1:31 am

Post by segher »

Very nice :-)

A few typoes / minor mistakes I remember after first reading:
- "stauration";
- setci etc. seem to have some wrong markup, "exttt";
- one of the last chapters talks about XS1-G4.

All the relative immediate branches (and ldap) use a
multiplier of 2 everywhere in this doc, I was under the
impression that it is 4 in dual issue mode?
richard
Respected Member
Posts: 318
Joined: Tue Dec 15, 2009 12:46 am

Post by richard »

segher wrote:Very nice :-)

A few typoes / minor mistakes I remember after first reading:
- "stauration";
- setci etc. seem to have some wrong markup, "exttt";
- one of the last chapters talks about XS1-G4.
Thanks for your feedback, I've passed your comments on.
All the relative immediate branches (and ldap) use a
multiplier of 2 everywhere in this doc, I was under the
impression that it is 4 in dual issue mode?
This is a mistake in the document. pc relative immediate operands are all be scaled by the issue width (2 for single issue, 4 for dual issue).
Hagrid
Active Member
Posts: 44
Joined: Mon Jul 29, 2013 4:33 am

Post by Hagrid »

richard wrote:A document describing the updated xCORE-200 instruction set is now available, see
xCORE-200: The XMOS XS2 Architecture (ISA)

This includes semantics for the new instructions and a description of the dual issue instruction execution scheme.
I could really use some diagrams/figures to help get my mind around the architecture document.