Einstein@Home for 64-bit Linux on AMD Athlon 64 X2

Anonymous
Topic 13399

Quote:
Regardless of our disputes here, I think that at least partial support for platform name x86_64-unknown-linux-gnu (and it's variant x86_64-pc-linux-gnu) should exist. Partial in sense that if server encounters client host of that platform, it returns client software for platform named i686-pc-linux-gnu.
Bruce, Bernd, is it too much for us to expect such a support? Until then, we'll have to fiddle with app_info.xml to run official binary :(

I understand. However, this is a tradeoff.

Running 32 bit binaries on a 64 bit Linux requires certain libraries, the presence of which is not e.g. reported by the Client. The benefit from having this partial support would be an easier installation for people who have this libraries installed, but the downside would be lots of Client Errors from machines which haven't, without giving useful error messages to the users.

We'll consider this. Given the increasing number of these machines I will probably make a native 64 bit App in the near future, which should be the cleanest way. For now I'd like to point you to the 4.16 App on the Power User Apps page. It comes with a working app_info.xml (I think that the 4.16 was slightly faster on some AMDs than 4.17, am I correct?).

For most users, I think, the use of the official 32 bit BOINC Core Client that reports the official platform would be the easiest way, though I am aware of that this prevents optimal crunching on the (few) projects that supply a native 64 bit Linux App.

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

Einstein@Home for 64-bit Linux on AMD Athlon 64 X2

Quote:
I'm not sure if they made it into the official version, but Akos was experimenting with hotloops that used both SSE and 387 instructions to process data in parallel. If they were put into the deployed version, disabling the 387 would be a significant performance hit for an x86-64 native app.

The current Einstein App uses this method, i.e. doing "more contributing" parts of the calculation in high precision (80bit on FPU) while doing the rest in single precision (SSE). For the current setup doing everything in single precision isn't precise enough.

This complicated way of calculation, btw, is the reason why I couldn't simply compile a (native) 64bit App of the current code.

We are working on the code for S5R2, and it looks like it will become a lot cleaner, and probably everything in the "inner loop" can be done in single precision, so it will be a little faster and it should also be easier to build native 64bit Apps (yes, we do care).

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: But let me correct you

Quote:
But let me correct you in that although SSE supports only single-precision, SSE2 supports double-precision too. Of course, if Einstein really needs to use x87's extended-precision 80-bit, that's the only way to go.


I know that there is SIMD support for double precision, but 1) there are (or at least were at time of coding) much more machines that could do SSE but coudn't run SSE2 than that could run both, and 2) (re-)aligning the data for double precision SIMD calculation ate up all speed we would gain from doing the just four FPU calculations in two double precision SSE2 calculations. It simply wasn't worth the effort. [Edit] Modern CPUs with their "virtually two FPUs" (another interface to the same physical unit) will combine the FPU calculations for us anyway.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

I actually made a SSE2

I actually made a SSE2 version once, not modifying the "hot loop", but other parts of the program (sin/cos LUT). It didn't gain much on some CPUs and was much slower on others (Akos said there _might_ be some advantage on Woodcrests). And yes, it required to rearrange the data for a larger part of the program. At that time, the hazzle of maintaining (and deploying) yet another different version of the code wasn't worth the minimal speedup on only a few CPUs.

For the techs: For the current Apps we maintain four ("production"-) versions of the source code (for the central function, BOINC and graphics is C++, the rest is plain vanilla C):
- Hand-coded Assembler used for all x86 CPUs capable of SSE
- Hand-coded Assembler for x87 calculations (for x86 CPUs that can't do SSE)
- An AltiVec version using Motorola's C/C++-API to AltiVec instructions
- A generic C version that runs on all other CPUs such as G3, MIPS and SPARC

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

My current plan is to build

My current plan is to build and distribute a native x86_64 App for the next run ("S5R2"), which should start next month. Currently I can neither tell how long exactly it will take to have a native App (I'll definitely not build one from the "old" code that's currently running), nor if and when the Admins find the time to set up things for automatic download of the 32bit App - my guess is that this will not happen during the last weeks of the current run.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.