BITR - nice one !!
Posted: Mon Sep 16, 2013 1:48 am
Just expressing my happiness at finding this instruction at assembly level.
While it seems quite simple and mundane at a glance, for FFT's ( my special interest ) this will readily take on the role of permuting the indices of an input vector. A traditional/significant first part of an FFT for a given transform length is to perform a special shuffling of the elements of an input vector, which can be encoded as a bit-reversal upon a particular length operand.
Thus for say an N-bit index ( where N is at least 1 and not greater than 32 ) on some register RX :
BITR RX, RX;
LSR RX, RX, #(32-N);
Graphically :
|---(32-N) zero bits ----|------------------ N original bits --------------|
flipped to
|------------------ N reversed bits -------------|---(32-N) zero bits ----|
slid to
|---(32-N) zero bits ----|------------------ N reversed bits -------------|
..... where LSR is performed because BITR operates on a complete 32-bit operand, and thus you want the upper N bits of the reversed index to slide ( back ) down to the lowest N positions. Now depending upon your data type one could suitably adjust the LSR's immediate for said type's length to obtain an address/offset.
If I've read matters correctly then these two operations run at one cycle each.
Cheers, Mike.
While it seems quite simple and mundane at a glance, for FFT's ( my special interest ) this will readily take on the role of permuting the indices of an input vector. A traditional/significant first part of an FFT for a given transform length is to perform a special shuffling of the elements of an input vector, which can be encoded as a bit-reversal upon a particular length operand.
Thus for say an N-bit index ( where N is at least 1 and not greater than 32 ) on some register RX :
BITR RX, RX;
LSR RX, RX, #(32-N);
Graphically :
|---(32-N) zero bits ----|------------------ N original bits --------------|
flipped to
|------------------ N reversed bits -------------|---(32-N) zero bits ----|
slid to
|---(32-N) zero bits ----|------------------ N reversed bits -------------|
..... where LSR is performed because BITR operates on a complete 32-bit operand, and thus you want the upper N bits of the reversed index to slide ( back ) down to the lowest N positions. Now depending upon your data type one could suitably adjust the LSR's immediate for said type's length to obtain an address/offset.
If I've read matters correctly then these two operations run at one cycle each.
Cheers, Mike.