I read the manual description of those two operations but don't understand the difference yet. Can someone explain with an example how shufpd compares to pshufd?
CodePudding user response:
pshufd
shuffles 32 bits as a unit.shufpd
shuffles 64 bits as a unit.pshufd
shuffles within a single register.shufpd
can merge-shuffle 2 registers.- They can be used to do the same task, but mixing integer and floating point instructions (
pshufd
with floating points, orshufpd
with integers) may cause a bypass delay.
Below is a copy paste from the Intel docs explaining each operation with pseudocode. The difference is very clear when you read carefully.
pshufd a, a, imm8
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
dst[31:0] := SELECT4(a[127:0], imm8[1:0])
dst[63:32] := SELECT4(a[127:0], imm8[3:2])
dst[95:64] := SELECT4(a[127:0], imm8[5:4])
dst[127:96] := SELECT4(a[127:0], imm8[7:6])
shufpd a, b, imm8
dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
Examples?
a = [1, 1, 2, 2]
b = [3, 3, 4, 4]
shufpd a, b, 1 -> [2, 2, 3, 3]
You cannot do this with pshufd
, but sometimes both can be used for the same task.
a = [1, 1, 2, 2]
pshufd a, a, 0x4e -> [2, 2, 1, 1]
shufpd a, a, 1 -> [2, 2, 1, 1]