GNU/Linux S5R3 App 4.09 available for Beta test

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820
Topic 13602

A new Linux App is available from our Beta Test page.

It has a number of new features:

- The most important difference is the new handling of floating-point problems that previously showed up as "Input domain error" or "Non-finite Dphi_alpha" (exit status 99). Instead of checking critical values for being finite at certain points the App will now throw a floating-point execption (FPE) when a NaN or a FPU stack error is encountered. First this allows us to remove the time-consuming explicit finite checks, and second this gives us a clearer idea where in the code these problems actually happen. (We are not sure whether these only occur on machines with hardware problems (usually too hot CPU), or whether there is still a software bug left). These errors should now show up with an exit status 8 and a stack dump in stderr output.

- Apparently the slight modifications we made to the code in the last minute when switching to S5R3 confused the branch prediction of gcc for a case distinction in the innermost loop of the program, which had a severe impact on performance. We now added a hint to gcc to optimize the more frequently used path.

- The new checkpointing code that was previously seen on the 4.07 Windows App has been used here, too. An (unwanted) side effect is that the checkpoint files are incompatible with the current official Linux App, so there is no picking up of tasks in progress by a different App. The app_info.xml in this package, however, should allow for completing tasks that have been assigned to 4.02 with this App, and new tasks should be assigned to 4.09, so you should be able to switch to anonymous platform at any time.

- From the 4.07 Windows App this one also inherits the possibility of disabling the graphics (thread) by putting a file EAH_NO_GRAPHICS into the BOINC folder; no messing with graphics installation or the .so file is required.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

GNU/Linux S5R3 App 4.09 available for Beta test

Sorry to hear that. Which client version are you using? Did you get a message saying what went wrong when the results in progress failed restart?

Anybody else seen this?

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

Sorry, that was my fault. I

Sorry, that was my fault. I just updated the archive with a new app_info.xml. Previously the md5 checksum shouldn't have matched that of the archive, but most people probably don't check it anyway.

Thanks for reporting this.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

Oh, if you still have S5R2

Oh, if you still have S5R2 work, the app_info.xml won't help. You would need to add a section for your current S5R2 App, depending on the version you were using. Or else wait until your S5R2 work is finished.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: I've encountered one

Quote:
I've encountered one faulty run: http://einstein.phys.uwm.edu//task/87453466.


Thanks!

This is a FPU stack overflow; one of these "this should never happen" errors (if the compiler is working correctly).

At least the floating-point exception generation & handling is working.

Something that worries me, though, is the message "Obtained 0 stack frames for this thread", which means that this doesn't give us much of a clue where precisely the error happened (and thus why). Maybe the main stack (or stack pointer) was corrupted, too.

Is the machine overclocked or was is a hot day?

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

My expectation was that in

My expectation was that in case of getting a signal the exit code would match the signal (here: 8). However it seems that here an FPE results in an exit status 22, which might be confusing.

Again backtrace() seems to be unable to get a useful stack dump. The reason for this might indeed be the 64Bit system. I'll take a look.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: Just got a bunch of

Quote:

Just got a bunch of code 22 errors on my other linux box. Here's one example. Strange that my first box, with virtually identical equipment but a slightly different kernel, seems to chugging alone just fine.

cpu time is always 0.00


With no CPU time this looks more like the usual 22 error that's described in the WIKI. Check that you have the 32Bit compatibility libraries installed. You had a couple of download errors on this box, maybe try to reset the project and download files from scratch. Also I've seen some problems with apparently non-existing files or directories with 5.10.x clients, which sometimes cure themselves without an apparent reason (and sometimes don't). I'm not using a 5.10 Client on any Linux box.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

The WIKI I meant was what

The WIKI I meant was what ageless pointed to:

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

It looks like there is a

It looks like there is a slight (i.e. rarely showing) problem with the new checkpointing code that's in 4.07 and in 4.09, too (see here). I'd like to look into that first.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.