_0 uses x87 arithmetic, _1 uses SSE instructions, _2 uses SSE2. SSE3 & up doesn't help enough to justify another application.
BM