Just expressing my happiness at finding this instruction at assembly level.

While it seems quite simple and mundane at a glance, for FFT's ( my special interest ) this will readily take on the role of permuting the indices of an input vector. A traditional/significant first part of an FFT for a given transform length is to perform a special shuffling of the elements of an input vector, which can be encoded as a bit-reversal upon a particular length operand.

Thus for say an N-bit index ( where N is at least 1 and not greater than 32 ) on some register RX :

BITR RX, RX;

LSR RX, RX, #(32-N);

Graphically :

|---(32-N) zero bits ----|------------------ N original bits --------------|

flipped to

|------------------ N reversed bits -------------|---(32-N) zero bits ----|

slid to

|---(32-N) zero bits ----|------------------ N reversed bits -------------|

..... where LSR is performed because BITR operates on a complete 32-bit operand, and thus you want the upper N bits of the reversed index to slide ( back ) down to the lowest N positions. Now depending upon your data type one could suitably adjust the LSR's immediate for said type's length to obtain an address/offset.

If I've read matters correctly then these two operations run at one cycle each.

Cheers, Mike.