GNU/Linux S5R3 App 4.24 available for Beta test

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820
Topic 13639

A new Linux App is available from our Beta Test page.

This App should fix the bug that caused a SEGV (signal 11) when the BOINC Core Client became unresponsive (e.g. due to network access / problems).

This "standard" (non-SSE) App has the "linear SIN/COS" code working (many thanks to Akos and Bikeman) and should thus be somewhat faster than the 4.20.

We (Einstein@home and BOINC) are migrating to a new way of doing the graphics (actually BOINC has, I'm still working on Apps for Einstein@home). In the present state I couldn't get the "old-style" graphics to work, but I guess not many Linux users will look at it anyway. The next App release should have "APIv6 graphics".

Please test and report!

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

GNU/Linux S5R3 App 4.24 available for Beta test

Quote:

I extracted this package to my project directory, but there were only two files in it contrary to the package description in the Linux section of the Beta Test Page. The md5sum came out right, but when I started the BOINC client back up, I got a message about a missing .so file.

Mon 14 Jan 2008 01:45:17 AM CST||file projects/einstein.phys.uwm.edu/einstein_S5R3_4.24_i686-pc-linux-gnu.so not found


Sorry, my fault (typical OOC error). The app_info.xml listed a einstein_S5R3_4.24_i686-pc-linux-gnu.so file. I updated the app_info.xml, the archive, the md5sum and the instructions on the webpage.

Thanks for reporting this.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: Hi Bernd, not a problem

Quote:
Hi Bernd,
not a problem really but I noticed sth odd: I'm still running Einstein under the debugger and with the new science app instead of starting up as soon as BOINC starts crunching an Einstein WU and running until the WU is done (or reboot) the debugger now starts, runs for a while (a few minutes) and gives an "exited normally" message. Then a new debugger starts. Looks like there's more than one process or so... was this to be expected?


I'm not sure I understand you correctly. It sounds like the App exits and is restarted rather often. Do you get a lot messages like "... exited with zero status but no 'finished' file" of the BOINC Client or see "no heartbeat from core client" in stderr.txt in the slot directory?

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

BTW: The version of the BOINC

BTW: The version of the BOINC library in this App uses the 'old' method for determining the CPU time. This means that even on older kernels the CPU time should be updated correctly.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: In fact, the message

Quote:
In fact, the message "no heartbeat from core client" shows up all the time.


For whatever reason the Core Client doesn't respomd to the App. Prior this lead to a signal 11, now with the fixed App it's just a restart.

However with the App not being able to run more than 30 seconds you will hardly get any 'work' done at all. By default it checkpoints every minute; and depending on your machine the time for recovering from a checkpoint might well be more than that.

In any case I'd remove the debugger file (the sig11 has been fixed anyway) and try to find out why the Core Client is gets somewhat stuck on your machine. The task you mentioned apparently ran fine for the first 18 minutes - anything changed then you are aware of (started a particular program, updated the system, whatever)?

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

It looks like newer Core

It looks like newer Core Clients become unresponsive when they loose network connection (probably for DNS timeouts), causing a missing 'hearbeat' on the App side. It's good to know that this is not limited to Linux. For the moment I'd suggest to use an older Core Client.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: RE: A new Linux App

Quote:
Quote:
A new Linux App is available from our Beta Test page.

A couple of points about the app_info.xml file distributed in this package:-

  • * If you don't have the 4.02 executable and .so files in your project folder you will get harmless complaints about these files being missing. Is it still possible for anyone to still be using these old versions? Couldn't these be safely dropped now?
    * The previous versions listed to be handled by 4.24 stop at 4.16. Shouldn't 4.20 also be included?
    * There may also be people running the "power user" 4.21 version who want to give 4.24 a spin. Shouldn't you also prepare for that possibility?


You're absolutely right. I updated the app_info.xml, the package and the md5sum again.

Thanks,

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: md5sum does not match

Quote:

md5sum does not match now for me (yesterday night it matched on another machine)

Now I'm getting:

56cb787158feb2d19af53e53cc169784


No clue. The one on the webpage is correct on the server side and when downloading it to my local machine.

Quote:
The app works, but I still get the error message "missing application file einstein_S5R3_4.02_ ..."


Surprises me a bit, but good to know. Just for the records: which Client version are you using? It's just a warning anyway.

Thanks for the report.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: The problem is that

Quote:
The problem is that when it happens I can't open the manager to se the logs :(


The logs are plain text files in the BOINC directory (stdoutdae.txt and stderrdae.txt); you don't need to use the manager.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: This is the first time

Quote:
This is the first time I've come across this particular error message. Hopefully someone may have a clue as to what caused this. The only thing I can think of is possible corruption of that file if it were being written to just as BOINC was being stopped for the transition to the new beta app. I presume the file is created when a task is first started and that it may be updated from time to time during the life of the task.


That's entirely true.

If everything is running fine now I wouldn't worry. It might be though that the slots/0 directory has a permission problem that could come up again in one of the next tasks.

BM

PS: as for the app_init.xml file - you're right, I should better strip it of all the references to 4.02. Feel free to do so manually until I found the time. I hope to be able to publish a new App soon anyway, as this one didn't show the speedup I expected, but I want to get the "signal 11" issue fixed.

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: I'm not sure what

Quote:
I'm not sure what "signal 2" and "signal 15" is, maybe that comes from terminating BOINC via ctrl-c at the command line?


It does.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.