GNU/Linux S5R3 App 4.31 available for Beta test

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820
Topic 13648

A new Linux App is available from our Beta Test page.

This App looks a little faster than the previous 4.24 due to some hacking with the sin/cos routine, and it is a new "separate graphics" App (featuring the "extended information" mentioned in the "screensver competition" thread).

It's probably not the fastest we can do w/o SSE, but in contrast tothe quick-fix 4.24 it's an actual release candidate.

Please test and report!

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

GNU/Linux S5R3 App 4.31 available for Beta test

Quote:
How does this relate to the Linux 4.27 'power' application? Is it the same but without the SSE optimisations, or are there other improvements involved with this release?


There is some tuning on single instructions in the sin/cos approximation code which should give a few % overall compared to the 4.24. IThis won't bring it up to the speed of the 4.27, though.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: So it took only 16-17%

Quote:
So it took only 16-17% more time to crunch with 4.31 compared to 4.27. That again tells me 4.27 isnt yet close to its "SSE powered" potential, you know what to do next for the penguin crunchers BM =D


Actually the speedup is more than I expected. However I recently learned that the fiddling eith the sin/cos code led to the compiler handling other parts of the code differently. The speedup you see is largely "delayed" from the the 4.20 -> 4.24 code changes (where there was a speedup announced but not actually observerd). It's not bound to the sin/cos stuff itself, and thus can't be ported to the SSE version - it's already included there.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: Do you have any idea

Quote:
Do you have any idea why Windows 4.26 is roughly equivalent in speed to Linux 4.20, at least when viewed from the perspective of running on identical AMD hardware? IOW, why does the Windows app need the Linear sin/cos code and the compiler optimizations to get close to the performance of the Linux code-compiler combination that doesn't have the Linear sin/cos routines?


Both compilers (gcc and MSVC) produce inefficient code in the "hot-loop" because they think they have too few FPU registers left for efficient code. On gcc you can get away lucky and it produces efficient code, denpending on how you fiddle with the sin/cos routine, but the MSC compiler seems to do it bad almost always, and the code is worse than that of the gcc.

I asked Akos to write an efficient implementation of the hot-loop in x87 assembler to be independent of the compiler; he agreed to do this, but I haven't received any code yet.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: RE: I asked Akos to

Quote:
Quote:
I asked Akos to write an efficient implementation of the hot-loop in x87 assembler to be independent of the compiler; he agreed to do this, but I haven't received any code yet.

Oops... I will make up it as fast as possible.
(I thought you want to optimize a bigger part of the code.)


No problem. I actually wasn't sure, but a simple "hot-loop" will probably do for a start. We are still experimenting with how get the compilers to do the float->int conversion in the sin/cos routine efficiently, but if we find we need assembler here, too, we could probably add it later anyway.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

I published the 4.31. Let's

I published the 4.31. Let's see how things go. I definitely need to fix the signal 11 problem "officially".

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: RE: Hi! This

Quote:
Quote:

Hi!

This particular fault (FPE) is really hard to produce with software bugs (other than compiler bugs), the most likely explanation is failing hardware. After all, a PIII Coppermine must be how old by now? 6 year? 7 years?

Most of the time it's not the CPU itself but things like failing fans, swollen capacitors on the motherboard, glitches in the power supply... Gary will be able to expand on this better than me. The E@H app has now reached a significant level of optimization and squeezes quite a bit of performance out of the FPU, so it's not surprising taht E@H is the first app to show symptoms of hardware failure.

CU
Bikeman

One should also consider, that in previous versions certain (if not all?) FPEs appear to be ignored -- thus we might see design/programming flaws today (unless those traps appear only on faulty hardware and never on faulty software design (which is okay, and human, and must happen)). But then - if it would be flawed design, then it shouldn't only appear here.

From the release info:

throws floating-point exception on NaNs and FPU stack errors

Of course I could verify that by running an older version without that new traps, but this might mean incorrect/drifting data - so my decision would then rather be "this old, p3 driven host cannot participate in einstein". (would be okay. It would be interesting whether other users with the same CPU get the same FPEs, but then I think those old p3 coppermine users haven't detected the latest version yet, and run old versions)

Update: the 4.35 SSE version seems also to produce errors:

2008-02-23 10:15:34 [Einstein@Home] Resuming task h1_0851.85_S5R3__372_S5R3b_1 using einstein_S5R3 version 435
2008-02-23 10:15:46 [Einstein@Home] Deferring communication for 1 min 0 sec
2008-02-23 10:15:46 [Einstein@Home] Reason: Unrecoverable error for result h1_0851.85_S5R3__372_S5R3b_1 (process exited with code 99 (0x63))


The FPE (at least the ones I've seen here) would almost always lead to a NaN in a certain variable and this to an error with exit status 99 a few instructions later (when there is a sanity check for array bounds). So these errors are taken from the "99" bunch of computing errors in order to get closer to the point where the error actually occurs, that's all.

There is at least one other reason for FPEs: a flaw in the operating system (or even in the compiler it was built with). Actually it should protect one process context against whatever is happening in other contexts, but apparently this doesn't alway work correctly. At least on Windows I read reports where a hardware driver (usually printer) could mess up the FPU stack and flags so badly that they weren't properly restored when switching back to a user process, generating an FPE there. I'm not sure that all possible Linux kernels (including self-built) have sufficient protection against bad drivers and other stuff running in kernel mode. I'm not even sure they all properly save all registers - I've seen the CPU type / register detection of the Linux fail to detect the right CPU (and thus available set of registers) at least in two cases.

You seem to have quite a number of machine running, can you point me to the machine or even better the result where the error happened with the 4.35?

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

The 4.31 App has become part

The 4.31 App has become part of the new 4.38 beta App package.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.