GNU/Linux S5R3 "power users" App 4.21 available

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820
Topic 13629

This App incorporates SSE vector code, but doesn't have a CPU feature detection. It will badly crash on non-SSE machines. It probably won't cure the 'signal 11' problem, it has the same BOINC code as the 4.20.

Only run this if you're sure of what you're doing.

Find it on the revived Power User Apps page.

Happy crunching!

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

GNU/Linux S5R3 "power users" App 4.21 available

Bikeman, thanks for pointing me to the sse/sse2 issue.

I'm not sure how much speedup is actually due to the SSE2 code. I'll fix the description on the Power Apps page, but won't build a new SSE-only App before I fixed some other problems in the code.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

Annika is pretty right

Annika is pretty right regarding the policies of "Beta" and "Pwer User's" Apps.

However I'm tempted to try to build completely different Apps for differnt CPU types and let the scheduler decide which one to deliver based on information from the Client instead of switching code segments in the App basd on loacal CPU feature detection. This way an App that now is a "Power User's App" could once make it into an 'official' one some time. But that's not my main concern right now.

Currently my top priority items for Einstein@home Apps are:

* fix the 'signal 11' problem of the Linux App (4.20/21). The Linux Apps migth be fast, but currently Linux hast the highest failure rate of all platforms, which is mainly due to this 'signal 11' errors.

* migrate the code to BOINC API v6. Instad of a seperate thread this will run graphics in a different process, so that problems with graphics don't crash the 'science' App. This should greatly improve the stability of the Windows App and also make the screensaver work on Windows Vista. It will also finally enable to run BOINC graphics in a xscreensaver hack, i.e. work as a real screensaver on Linux.

* fix the fast 'linear' SIN/COS calculation code for all platforms. Currently this only works when compiled with Apple's version of gcc on MacOS Intel. In principle it should be generic and thus give soem speedup on all platforms.

Obviously this is still dominated by bug hunting and fixing. If I could improve speed along the way without hindering that, I'll do this and publish what I find working.

Currently the SSE code can only be compiled with gcc; and when linking gcc code to MSC code the debugging information of the gcc objects get lost. Therefore I find it rather unlikely that there will be a Windows App with an SSE 'hot-loop', say, this year.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: RE: It also seems to

Quote:
Quote:

It also seems to only happen on machines with the newer BOINC clients. The old 5.8.x clients don't seem to be affected, where the 5.10.x clients are. (Of course, that could just be a matter of timing, which means that my theory could be wrong.)

So, I'm wondering, if something in BOINC, rather than the Einstein app, could be causing the problem.

I heard alot of talking about BOINC 5.10.x screwing up badly when it encounters connection problems in various Project forums, i.e. erroneously killing all Results in progress, and several people started avoiding 5.10.x until the bug is fixed. (haven't encountered this myself though, but my connection is usually rock stable)


There are some bug in the 5.10.x that prevented me from using it. At least the 'truncate stderr' bug has been fixed by now. I still see the first tasks after a new installation error out in some cases, apparently because the App is started before all the files have been downloaded completely.

Anyway I'm pretty sure that there is a reason for the segfault in the Einstein App left (either in our code or the BOINC library that's linked into the App). Other problems could be caused by the Client, but hardly that one. And roughly 50% of the 'signal 11' errors I've seen in the DB are from 5.8 Clients.

I have access to a cluster of machines that shows this problem rather frequently. However it's pretty slow for nowadays (PIII), tasks run quite long there and until now the problem hasn't appeared under a debugger. I'd guess I won't find it before I'm away for Xmas.

Annika,
if you still see this problem running BOINC as an ordinary user that is logged in, please try:
- touch a file "EAH_DEBUG_DDD" in the BOINC directory
- each time a new task is started, the App will launch the DDD debugger attached to it.
- press the "Cont" button on the "Command toolbar" or type "cont" at the "(gdb)" prompt in the main window.
- If the App catches a signal 11 (shown in the gdb window), type "bt" and post the output (stack bactrace) here. "bt full" gives even more informative but much, much longer output.
- Alternatively you can save a corefile by typing "gcore" (at the "(gdb)" prompt) and compress & upload it somewhere to make it available to me (probably way too big for eMail).

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: Bernd, it may sound

Quote:
Bernd, it may sound nooblike (and probably is) but I can't for the life of me figure out how to get the backtrace output pasted from the debugger. I select copy, try to paste it into the text editor, and nothing happens. I right click and all I get are options to clear the text. I try looking it up in the manual, nothing there. Please help, I'm lost...


Depends on the environment (gnome/KDE etc), but the old-style X11 (mark w. left mouse key pressed, paste with middle mouse key (or pressing both keys at once with a 2-Button mouse) in any other window) should work in all cases.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: This is what my latest

Quote:

This is what my latest signal 11 looked like:

Quote:
(gdb) bt full
#0 0xb7e56ac0 in ?? () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
Cannot access memory at address 0xb7fa9d64
(gdb)


OMG.

You're right, that's not helpful. Looks like the stack (or even more memory) is completely trashed.

I got one report that the problem vanished on a particular machine with a BOINC Client as old as 5.3.31. You may want to give it a try.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: My last WU's died ever

Quote:

My last WU's died ever with Signal 11 with Boinc-5.10.21/28.
Today i switched to Boinc-5.10.30 and this WU dies with Signal 34.
In Boinc-Log i found that:
file projects/einstein.phys.uwm.edu/einstein_S5R3_4.02_i686-pc-linux-gnu not found
file projects/einstein.phys.uwm.edu/einstein_S5R3_4.02_i686-pc-linux-gnu.so not found
After starting einstein-wu within a few seconds:
Starting einstein
Starting einstein using einstein_S5R3 version 4.21
Computation for task einstein finished
Output file einstein for task einstein absent

I don't know why. I've seen multiple Signal 11 also on Spinhenge-WU's. My other projects Simap, QMC, Seti, Rieselsieve, Chess960, LHC running without problems.


Thanks for the report.

Ths "file ... 4.02 ... not found" could safely be ignored if the 4.21 App was working on your machine. However this doesn't seem to be the case, the App got a "signal 4", which is is an "illegal instruction". There shouldn't be anything in the App that a Core2 CPU can't handle. I'd suggest to download the archive again and check the md5 checksum before unpacking it again (overwriting the old files). You may also want to let the client get the 4.02 App files or manually download them to get rid of the error messages.

The references to the other projects are helpful, thanks! Probably the Apps of Spinhenge like the one of Einstein are built with a 'bleeding edge' version of the BOINC library, while the other project use older versions.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: They are running

Quote:
They are running different version of the core client. Windows is 5.10.30 (I think) installed as a service and Linux is 5.10.21 installed via rpm as a system daemon.


Does this run BOINC and the App as root? (try "ps -ef | grep eistein" or similar)?

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: As far as I know (and

Quote:
As far as I know (and Eric Myers can confirm for you) when installing as a rpm, it runs as its own user. The rpm creates a user boinc and the home directory is /var/lib/boinc.


Well, what the rpm does depends on the distributor. Might be that it is more common by now to create an own user (which is good), but I've seen installations where the client (and thus the App) ran as root.

Quote:
Here's the output of ps, I hope it makes sense to you because it doesn't make sense to me.


It does. The App is running as user "boinc".

It also reveals that this is a dual-CPU machine, which might be the reason why I couldn't reproduce the problem. I'll try with a dual-core VM.

Are others seeing this problem (only) on multi-CPU/core machines?

Thanks,

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: RE: Ubuntu 7.04

Quote:
Quote:

Ubuntu 7.04 Boinc 5.2.13

Wow!

But this is a segmentation fault inside the BOINC software itself, so someone should make a bug report to BOINC's TRAC system, I guess. This issue should not have any relation to Einstein@Home.

CU
Bikeman

Right, this is the BOINC Core Client. Given the pretty old version, though, I wonder if anyone @BOINC cares...

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

The "signal 11" happens in

The "signal 11" happens in the BOINC library (the part of BOINC that gets linked into the application) whenever the Core Client becomes unresponsive. Newer Clients seem to become unresponsive more often than older ones (e.g. for DNS requests), but in principle it could happen with older Clients, too. We are working on fixing the problem.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.