assembly - Optimal SIMD algorithm to rotate or transpose an array -
i working on data structure have array of 16 uint64. laid out in memory (each below representing single int64):
a0 a1 a2 a3 b0 b1 b2 b3 c0 c1 c2 c3 d0 d1 d2 d3
the desired result transpose array this:
a0 b0 c0 d0 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3
the rotation of array 90 degrees acceptable solution future loop:
d0 c0 b0 a0 d1 c1 b1 a1 d2 c2 b2 a2 d3 c3 b3 a3
i need in order operate on arrow fast @ later point (traverse sequentially simd trip, 4 @ time).
so far, have tried "blend" data loading 4 x 64 bit vector of a's, bitmaskising , shuffling elements , or'ing b's etc , repeating c's... unfortunately, 5 x 4 simd instructions per segment of 4 elements in array (one load, 1 mask, 1 shuffle, 1 or next element , store). seems should able better.
i have avx2 available , compiling clang.
uint64_t a[16] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; __m256i row0 = _mm256_loadu_si256((__m256i*)&a[ 0]); //0 1 2 3 __m256i row1 = _mm256_loadu_si256((__m256i*)&a[ 4]); //4 5 6 7 __m256i row2 = _mm256_loadu_si256((__m256i*)&a[ 8]); //8 9 b __m256i row3 = _mm256_loadu_si256((__m256i*)&a[12]); //c d e f
i don't have hardware test on right following should want
__m256i tmp3, tmp2, tmp1, tmp0; tmp0 = _mm256_unpacklo_epi64(row0, row1); //0 4 2 6 tmp1 = _mm256_unpackhi_epi64(row0, row1); //1 5 3 7 tmp2 = _mm256_unpacklo_epi64(row2, row3); //8 c e tmp3 = _mm256_unpackhi_epi64(row2, row3); //9 d b f //now select appropriate 128-bit lanes row0 = _mm256_permute2x128_si256(tmp0, tmp2, 0x20); //0 4 8 c row1 = _mm256_permute2x128_si256(tmp1, tmp3, 0x20); //1 5 9 d row2 = _mm256_permute2x128_si256(tmp0, tmp2, 0x31); //2 6 e row3 = _mm256_permute2x128_si256(tmp1, tmp3, 0x31); //3 7 b f
the
__m256i _mm256_permute2x128_si256 (__m256i a, __m256i b, const int imm)
intrinsic selects 128-bit lanes 2 sources. can read in the intel intrinsic guide. there version _mm256_permute2f128_si256
needs avx , acts in floating point domain. used check used correct control words.
Comments
Post a Comment