DotProduct x87 vs SSE
http://www.badongo.com/file/5650132
測試DotProduct(AxA)
dp3:021C3663, 00040EA8, Cycle: 8.114502 125330.000000
dp3x87:02509AEB, 00048090, Cycle: 9.004395 125330.000000
dp3x87_:02188C35, 00040E72, Cycle: 8.112854 125330.000000
dp3sse:024A5435, 00047937, Cycle: 8.946991 125330.000000
dp3sse_:031D7ED0, 00060FCC, Cycle: 12.123413 125330.000000
dp3sse__:02C684FD, 000571BC, Cycle: 10.888550 125330.000000
dp3sse___:02D7F641, 00058FCB, Cycle: 11.123383 125330.000000
dp3sse__1:02E12341, 000580C8, Cycle: 11.006104 125330.000000
dp3sse3:0421FC6B, 00079770, Cycle: 15.183105 125330.000000
dp3sse4:16566E10, 16566E10, Cycle: 11436.859863 125330.000000
測試DotProduct(AxB)
dp3:02A5ED64, 000529FB, Cycle: 10.327972 34293.000000
dp3x87:031DF04C, 000529E9, Cycle: 10.327423 34293.000000
dp3x87_:02B60749, 00054114, Cycle: 10.508423 34293.000000
dp3sse:02E1E203, 00051B91, Cycle: 10.215363 34293.000000
dp3sse_:0476B1A3, 0008B260, Cycle: 17.393555 34293.000000
dp3sse__:03DF3FBF, 0007828E, Cycle: 15.019958 34293.000000
dp3sse3:0544A6B2, 000A2804, Cycle: 20.312622 34293.000000
dp3sse4:1615C104, 1615C104, Cycle: 11307.507935 34293.000000
因為我的CPU不支援SSE4
而結果是利用Intel的SSE4Emu.dll模擬出來的
所以Cycle才會那麼久
也就是說Core 2系列
不太適合用SSE來算DotProduct...
標籤: Assembly

0 Comments:
張貼留言
<< Home