During the last weeks and months we have been mainly busy with getting the Albert setup working, so I had not much time to spend on further optimization.
- The AltiVec-version of code is hancoded, explicitely using vector instructions where possible (at least in the very core of the program).
- On Linux, if SSE is detected the App switches to a part of the program that has been optimized for SSE by the compiler (gcc 3.4 or 4.0).
- On Windows we use the stock MSC compiler (7.1) on the generic version of the code.
I played with compiler options, compiler versions and modifications to the code for quite some time, but found the following measurements not to give any significant improvement in the calculation times compared to the Apps we currently deliver:
- prefer SSE2 over SSE when available (Linux)
- use hand-coded vector code (for SSE2) instad of leaving the optimization to the compiler (Linux)
- use SSE(2) optimization of the MSC compiler (Windows)
- use icc (the Intel compiler, version 8) instead of gcc or MSC
So my preliminary conclusions are that
- The MSC compiler does a suprisingly good job, at least on our code
- The SSE optimization of gcc seems to give results that are (nearly) as good as hand-written code
- The AltiVec Unit is simply better (and somewhat easier to program) than the SSE stuff; thats why I desperately regret the decision of Apple ragarding CPUs.
I began to play with the auto-vectorization of gcc-4 and icc-9, but without a usable result yet. It's something I'm still working on.
BM

Bruce, a question about An Optimized Application
)
I'm very interested in this. I'll send you an email off list.
Cheers,
Bruce
RE: RE: Any results on
)
This is my fault -- I got caught up in some urgent things at this end. I've just written to you off-list.
Cheers,
Bruce
RE: Any News? Yes. Akosf
)
Yes. Akosf has done three things to speed up our executable. One of these optimizations is very clever: it eliminates large numbers of (slow) divisions. We're in the process of building and testing new executables (for all platforms) that incorporate these changes. They should result in very substantial speed-ups.
Bruce
RE: akosf said he looked at
)
Here's an update.
We've now incorporated Akosf's improvements into our source code. But we haven't started distributing this faster application yet, for a simple reason. We are worried that our project server might break with the increased upload/validation disk load since the work will be getting done faster when we begin distributing new apps to all users. So we're upgrading the disk controllers and should be ready for this increased load soon.