Page 1 of 2

single bit test in one instruction

PostPosted: Fri Aug 30, 2013 3:58 am
by notzed
There's no BTST, and AND doesn't take a literal so you would have to load the register first (with 1 or two instruction overhead) ... but a single bit-test of any bit can be performed in one instruction:

Code: Select all
        ;; Test bit X is set
        lsl     r0,r1,#(31-X)
        blt     .bit_is_set
        bgte    .bit_is_not_set


More background over here: http://a-hackers-craic.blogspot.com.au/2013/08/epiphany-bit-test.html

Maybe it's common knowledge but as i had to 'discover it' it may save someone else the search.

Re: single bit test in one instruction

PostPosted: Fri Aug 30, 2013 9:35 pm
by Gravis
good to know and since we are trying to be quick, i figured it would be good to point out the cycle counts for both using an immediate value and using an existing value. i also looked into using the second pipeline but it doesnt help. :(

bit # is constant - 5 or 6 cycles (assuming full IALU overlap)
Code: Select all
   ;; test bit #X in R1 is set
   lsl    r0,r1,#(31-X)   ; +5 cycles - 4 (max) IALU overlap
   blt    .bit_is_set     ; +1 cycle + 3 if branch
   bgte   .bit_is_not_set ; +1 cycle + 3 if branch


bit # is stored in a register - 7 or 8 cycles (assuming full IALU overlap)
Code: Select all
   ;; test bit #R2 in R1 is set
   mov    r0, 31          ; +5 cycles - 4 (max) IALU overlap
   sub    r0, r0, r2      ; +1
   lsl    r0, r1, r0      ; +1
   blt    .bit_is_set     ; +1 cycle + 3 if branch
   bgte   .bit_is_not_set ; +1 cycle + 3 if branch

Re: single bit test in one instruction

PostPosted: Fri Aug 30, 2013 9:48 pm
by EggBaconAndSpam
do not forget pipelining, IALU operations do not cause stalls (in this scenario)...

-test, bit is constant
lsl // 1 cycle
blt // 1 if not taken, 4 if taken
b // 4, use plain branch instead!

overall we are talking about either 5 or 6 cycles, inlining would make sense


-bit stored in register
mov // 1
sub // 1
lsl // 1
blt // 1 or 4
b // 4

overall either 7 or 8 cycles

// write this in C instead, the compiler might be able to do (31-param) at compile time!
// also, inlining!

Re: single bit test in one instruction

PostPosted: Sat Aug 31, 2013 1:47 pm
by timpart
For other playing around with bits tips, take a look at http://graphics.stanford.edu/~seander/bithacks.html. It is in C code but using operators that easily translate to assembler in most cases.

I ignore the first 3 cycles of an instruction when thinking about timings. They get hidden in the pipeline overlap.

Tim

Re: single bit test in one instruction

PostPosted: Sat Aug 31, 2013 2:00 pm
by EggBaconAndSpam
those 3 particular pipeline stages are responsible for the 3 cycle branch penalty

Re: single bit test in one instruction

PostPosted: Sat Aug 31, 2013 6:16 pm
by Gravis
EggBaconAndSpam wrote:do not forget pipelining, IALU operations do not cause stalls (in this scenario)...

DOH! fixed it.

timpart wrote:For other playing around with bits tips, take a look at http://graphics.stanford.edu/~seander/bithacks.html. It is in C code but using operators that easily translate to assembler in most cases.

it has some good info for sure but i think not all of them may not be optimal for the epiphany instruction set.

Re: single bit test in one instruction

PostPosted: Mon Sep 02, 2013 3:10 am
by notzed
Some of you guys have a weird way of counting cycles. It seems very odd to count the 3-cycle branch penalty at the target point - even if it was one. Of course such a fragment would be "in-lined" and is not the whole programme or subroutine - that would be absurd! Even if it was the start of an interrupt handler or subroutine or branch target the 3 cycle warmup isn't anything that can be optimised out so is irrelevent in terms of algorithm choice as far as programming for or compiling to the chip is concerned. A cycle-accurate simulator is another matter, but off-topic.

I only showed both cases to demonstrate both decisions were possible! You would never decide on branching both ways as the one taken is already the opposite of not taking it. You only ever need to choose one and ideally choose it such that the most common case didn't branch or is an otherwise necessary branch - such as at the end of a loop (and therefore wouldn't cost anything "extra").

I would only count:

2 cycles for branch not taken, or using it for a single movcc
5 for branch taken

i.e. test is 1 cycle, decision is 1 or 4.

Since the decision logic is forced by the ISA to be the same no matter what the test, the test itself is the only information of value here - i.e. it's 1 cycle and therefore just as efficient as if the ISA had an explicit instruction to perform the same test. I think that's pretty nifty given how small the ISA is.

(assuming full IALU overlap)


What does this even mean here? It's not an assumption if it is the only behaviour the hardware exhibits. i.e. IALU ops always take 1 cycle and there can be no stalls due to register dependencies. It was designed this way on purpose.

Or are you trying to account for external factors such as fetch stalls?

Re: single bit test in one instruction

PostPosted: Mon Sep 02, 2013 6:33 am
by hewsmike
notzed wrote:There's no BTST, and AND doesn't take a literal so you would have to load the register first (with 1 or two instruction overhead) ... but a single bit-test of any bit can be performed in one instruction:

Code: Select all
        ;; Test bit X is set
        lsl     r0,r1,#(31-X)
        blt     .bit_is_set
        bgte    .bit_is_not_set


More background over here: http://a-hackers-craic.blogspot.com.au/2013/08/epiphany-bit-test.html

Maybe it's common knowledge but as i had to 'discover it' it may save someone else the search.

That's a neat construct. r0 is used as scratch and r1 is unaltered. This can become a macro where r1 is a parameter to same ( X may be another ), and thus any general register may be interrogated bitwise and non-destructively so ( except r0 itself ! ).

Cheers, Mike

Re: single bit test in one instruction

PostPosted: Tue Sep 03, 2013 6:25 pm
by Gravis
notzed wrote:Some of you guys have a weird way of counting cycles. It seems very odd to count the 3-cycle branch penalty at the target point - even if it was one.

always assume the worst case scenario.

notzed wrote:
(assuming full IALU overlap)

...
Or are you trying to account for external factors such as fetch stalls?

bingo.

Re: single bit test in one instruction

PostPosted: Wed Sep 04, 2013 2:59 am
by notzed
Gravis wrote:
notzed wrote:Some of you guys have a weird way of counting cycles. It seems very odd to count the 3-cycle branch penalty at the target point - even if it was one.

always assume the worst case scenario.

notzed wrote:
(assuming full IALU overlap)

...
Or are you trying to account for external factors such as fetch stalls?

bingo.


Oh kay.