single bit test in one instruction

single bit test in one instruction

Postby notzed » Fri Aug 30, 2013 3:58 am

There's no BTST, and AND doesn't take a literal so you would have to load the register first (with 1 or two instruction overhead) ... but a single bit-test of any bit can be performed in one instruction:

Code: Select all
        ;; Test bit X is set
        lsl     r0,r1,#(31-X)
        blt     .bit_is_set
        bgte    .bit_is_not_set


More background over here: http://a-hackers-craic.blogspot.com.au/2013/08/epiphany-bit-test.html

Maybe it's common knowledge but as i had to 'discover it' it may save someone else the search.
notzed
 
Posts: 331
Joined: Mon Dec 17, 2012 12:28 am
Location: Australia

Re: single bit test in one instruction

Postby Gravis » Fri Aug 30, 2013 9:35 pm

good to know and since we are trying to be quick, i figured it would be good to point out the cycle counts for both using an immediate value and using an existing value. i also looked into using the second pipeline but it doesnt help. :(

bit # is constant - 5 or 6 cycles (assuming full IALU overlap)
Code: Select all
   ;; test bit #X in R1 is set
   lsl    r0,r1,#(31-X)   ; +5 cycles - 4 (max) IALU overlap
   blt    .bit_is_set     ; +1 cycle + 3 if branch
   bgte   .bit_is_not_set ; +1 cycle + 3 if branch


bit # is stored in a register - 7 or 8 cycles (assuming full IALU overlap)
Code: Select all
   ;; test bit #R2 in R1 is set
   mov    r0, 31          ; +5 cycles - 4 (max) IALU overlap
   sub    r0, r0, r2      ; +1
   lsl    r0, r1, r0      ; +1
   blt    .bit_is_set     ; +1 cycle + 3 if branch
   bgte   .bit_is_not_set ; +1 cycle + 3 if branch
Last edited by Gravis on Sat Aug 31, 2013 6:30 pm, edited 3 times in total.
User avatar
Gravis
 
Posts: 445
Joined: Mon Dec 17, 2012 3:27 am
Location: East coast USA.

Re: single bit test in one instruction

Postby EggBaconAndSpam » Fri Aug 30, 2013 9:48 pm

do not forget pipelining, IALU operations do not cause stalls (in this scenario)...

-test, bit is constant
lsl // 1 cycle
blt // 1 if not taken, 4 if taken
b // 4, use plain branch instead!

overall we are talking about either 5 or 6 cycles, inlining would make sense


-bit stored in register
mov // 1
sub // 1
lsl // 1
blt // 1 or 4
b // 4

overall either 7 or 8 cycles

// write this in C instead, the compiler might be able to do (31-param) at compile time!
// also, inlining!
EggBaconAndSpam
 
Posts: 32
Joined: Tue Jul 16, 2013 2:39 pm

Re: single bit test in one instruction

Postby timpart » Sat Aug 31, 2013 1:47 pm

For other playing around with bits tips, take a look at http://graphics.stanford.edu/~seander/bithacks.html. It is in C code but using operators that easily translate to assembler in most cases.

I ignore the first 3 cycles of an instruction when thinking about timings. They get hidden in the pipeline overlap.

Tim
timpart
 
Posts: 302
Joined: Mon Dec 17, 2012 3:25 am
Location: UK

Re: single bit test in one instruction

Postby EggBaconAndSpam » Sat Aug 31, 2013 2:00 pm

those 3 particular pipeline stages are responsible for the 3 cycle branch penalty
EggBaconAndSpam
 
Posts: 32
Joined: Tue Jul 16, 2013 2:39 pm

Re: single bit test in one instruction

Postby Gravis » Sat Aug 31, 2013 6:16 pm

EggBaconAndSpam wrote:do not forget pipelining, IALU operations do not cause stalls (in this scenario)...

DOH! fixed it.

timpart wrote:For other playing around with bits tips, take a look at http://graphics.stanford.edu/~seander/bithacks.html. It is in C code but using operators that easily translate to assembler in most cases.

it has some good info for sure but i think not all of them may not be optimal for the epiphany instruction set.
User avatar
Gravis
 
Posts: 445
Joined: Mon Dec 17, 2012 3:27 am
Location: East coast USA.

Re: single bit test in one instruction

Postby notzed » Mon Sep 02, 2013 3:10 am

Some of you guys have a weird way of counting cycles. It seems very odd to count the 3-cycle branch penalty at the target point - even if it was one. Of course such a fragment would be "in-lined" and is not the whole programme or subroutine - that would be absurd! Even if it was the start of an interrupt handler or subroutine or branch target the 3 cycle warmup isn't anything that can be optimised out so is irrelevent in terms of algorithm choice as far as programming for or compiling to the chip is concerned. A cycle-accurate simulator is another matter, but off-topic.

I only showed both cases to demonstrate both decisions were possible! You would never decide on branching both ways as the one taken is already the opposite of not taking it. You only ever need to choose one and ideally choose it such that the most common case didn't branch or is an otherwise necessary branch - such as at the end of a loop (and therefore wouldn't cost anything "extra").

I would only count:

2 cycles for branch not taken, or using it for a single movcc
5 for branch taken

i.e. test is 1 cycle, decision is 1 or 4.

Since the decision logic is forced by the ISA to be the same no matter what the test, the test itself is the only information of value here - i.e. it's 1 cycle and therefore just as efficient as if the ISA had an explicit instruction to perform the same test. I think that's pretty nifty given how small the ISA is.

(assuming full IALU overlap)


What does this even mean here? It's not an assumption if it is the only behaviour the hardware exhibits. i.e. IALU ops always take 1 cycle and there can be no stalls due to register dependencies. It was designed this way on purpose.

Or are you trying to account for external factors such as fetch stalls?
notzed
 
Posts: 331
Joined: Mon Dec 17, 2012 12:28 am
Location: Australia

Re: single bit test in one instruction

Postby hewsmike » Mon Sep 02, 2013 6:33 am

notzed wrote:There's no BTST, and AND doesn't take a literal so you would have to load the register first (with 1 or two instruction overhead) ... but a single bit-test of any bit can be performed in one instruction:

Code: Select all
        ;; Test bit X is set
        lsl     r0,r1,#(31-X)
        blt     .bit_is_set
        bgte    .bit_is_not_set


More background over here: http://a-hackers-craic.blogspot.com.au/2013/08/epiphany-bit-test.html

Maybe it's common knowledge but as i had to 'discover it' it may save someone else the search.

That's a neat construct. r0 is used as scratch and r1 is unaltered. This can become a macro where r1 is a parameter to same ( X may be another ), and thus any general register may be interrogated bitwise and non-destructively so ( except r0 itself ! ).

Cheers, Mike
hewsmike
 
Posts: 85
Joined: Mon Dec 17, 2012 3:20 am

Re: single bit test in one instruction

Postby Gravis » Tue Sep 03, 2013 6:25 pm

notzed wrote:Some of you guys have a weird way of counting cycles. It seems very odd to count the 3-cycle branch penalty at the target point - even if it was one.

always assume the worst case scenario.

notzed wrote:
(assuming full IALU overlap)

...
Or are you trying to account for external factors such as fetch stalls?

bingo.
User avatar
Gravis
 
Posts: 445
Joined: Mon Dec 17, 2012 3:27 am
Location: East coast USA.

Re: single bit test in one instruction

Postby notzed » Wed Sep 04, 2013 2:59 am

Gravis wrote:
notzed wrote:Some of you guys have a weird way of counting cycles. It seems very odd to count the 3-cycle branch penalty at the target point - even if it was one.

always assume the worst case scenario.

notzed wrote:
(assuming full IALU overlap)

...
Or are you trying to account for external factors such as fetch stalls?

bingo.


Oh kay.
notzed
 
Posts: 331
Joined: Mon Dec 17, 2012 12:28 am
Location: Australia

Next

Return to Assembly

Who is online

Users browsing this forum: No registered users and 2 guests

cron