Bruce, a question about An Optimized Application

Anonymous
Topic 13217

During the last weeks and months we have been mainly busy with getting the Albert setup working, so I had not much time to spend on further optimization.

- The AltiVec-version of code is hancoded, explicitely using vector instructions where possible (at least in the very core of the program).
- On Linux, if SSE is detected the App switches to a part of the program that has been optimized for SSE by the compiler (gcc 3.4 or 4.0).
- On Windows we use the stock MSC compiler (7.1) on the generic version of the code.

I played with compiler options, compiler versions and modifications to the code for quite some time, but found the following measurements not to give any significant improvement in the calculation times compared to the Apps we currently deliver:

- prefer SSE2 over SSE when available (Linux)
- use hand-coded vector code (for SSE2) instad of leaving the optimization to the compiler (Linux)
- use SSE(2) optimization of the MSC compiler (Windows)
- use icc (the Intel compiler, version 8) instead of gcc or MSC

So my preliminary conclusions are that
- The MSC compiler does a suprisingly good job, at least on our code
- The SSE optimization of gcc seems to give results that are (nearly) as good as hand-written code
- The AltiVec Unit is simply better (and somewhat easier to program) than the SSE stuff; thats why I desperately regret the decision of Apple ragarding CPUs.

I began to play with the auto-vectorization of gcc-4 and icc-9, but without a usable result yet. It's something I'm still working on.

BM

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

Bruce, a question about An Optimized Application

Quote:

Hi!

I did a hand-optimized version of the albert code. (windows, no SSE)
It produces absolutely correct results, but at least two times faster.
Can I use it without any kickback?

I'm very interested in this. I'll send you an email off list.

Cheers,
Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

RE: RE: Any results on

Quote:
Quote:
Any results on this so far? Any beta-testers needed? :)
I would like to help on speed optimization, but i don't know what is the way of it. Probably i can put my code on a webpage, but i think it would be not legal. I didn't get any e-mails in connection with legitimacy.

This is my fault -- I got caught up in some urgent things at this end. I've just written to you off-list.

Cheers,
Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

RE: Any News? Yes. Akosf

Quote:
Any News?

Yes. Akosf has done three things to speed up our executable. One of these optimizations is very clever: it eliminates large numbers of (slow) divisions. We're in the process of building and testing new executables (for all platforms) that incorporate these changes. They should result in very substantial speed-ups.

Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

RE: akosf said he looked at

Quote:

akosf said he looked at the disassembly of the linux binaries, but they were different enough he wasn't able to figure out what was going on.

Bruce's promised new, official applications based on akosf's algorithm improvements, but they haven't done so yet. Mainly I suspect the delay is in translating the assembly back into C to allow crossplatform development.

Here's an update.

We've now incorporated Akosf's improvements into our source code. But we haven't started distributing this faster application yet, for a simple reason. We are worried that our project server might break with the increased upload/validation disk load since the work will be getting done faster when we begin distributing new apps to all users. So we're upgrading the disk controllers and should be ready for this increased load soon.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.