GNU/Linux S5R3 App 4.16 available for Beta test

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820
Topic 13624

A new Linux App is available from our Beta Test page.

This App was built with newer version of the BOINC library that I hope to fix some of the segfault client errors (exit status 11).

In addition it will stop trying to immediately sync the checkpoint file after five successive failures (which should help e.g. on XFS). And in contrast to the 4.14 will still keep the checkpoint if syncing failed.

I intend to make this one the "official" Linux App soon, we should at least get some more information about the computing errors we still get of the 4.02 App.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

GNU/Linux S5R3 App 4.16 available for Beta test

Thank you for the report!

I can see that this is bad for you, but maybe it will help us anyway.

Do you have ddd debugger installed on the machine? If so, you could create a file "EAH_DEBUG_DDD" in the BOINC directory, and the next time a tash is started should fire up ddd attached to it. Hitting the "Cont" button will let the App run under the debugger. It should catch the signal and list where it occurred.

Thanks a lot!

I'll try to reproduce this on the same system.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: I'll try to reproduce

Message 3358 in response to message 3357

Quote:
I'll try to reproduce this on the same system.


Yep, seen it:
update_app_progress (cpu_t=1.008062, cp_cpu_t=0) at boinc_api.C:265
Apparently this BOINC library version makes things rather worse than better.
Thanks a lot again for the report!

All others: be careful when using this Beta App. Would be nice to get some reports of systems where it works. I hope to issue a new, fixed version soon.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

Interesting. I'm pretty sure

Interesting. I'm pretty sure it hasn't anything to do with the BOINC Core client version. Is this App actually running on any system at all?

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

We got a fix for BOINC from

We got a fix for BOINC from David Anderson. I'm currently rebuilding and updting the App. The 4.16 Beta Test package has been removed until it got updated.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

Ok, I updated the 4.16 App

Ok, I updated the 4.16 App with new BOINC library (as of today).

Please download the new package with the old name and replace the files
(new md5 is 4a13337ab423e80cabacc1e14fdf1866, old was dc0867738e712a71ca1ec458c0eec185).

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: The other issue I've

Quote:
The other issue I've had on my ancient machine, namely that it doesn't record CPU time consumed (see this post) is still present. Might be a show-stopper? It is naughty as it screws-up client scheduler because it can't predict when to fetch new WUs or how long is it going to take to finish the current one (and thusly running in EDF mode). Any chance to fix it?


What system (glibc, Kernel) is this?

I don't want to go into too much detail here, but recent changes to BOINC affected the handling of CPU time. The old way was violating standards and caused some trouble (e.g. the "hang" problem on MacOS), while the new may not work correctly on ancient systems (that have a non-standard behavior of the pthread library).

I think that with the next generation of Apps & Clients (major version number 6) there is a way around this, but for now we had to make a decision between inconvenience on old and showstoppers on new systems.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: So, if it doesn't

Quote:
So, if it doesn't crunch, would it read the checkpoint at all? Why should it if it doesn't crunch? Do you know anything about that? It might help with finding out when exactly the problem occurs...


What's your setting of "Leave applications in memory while suspended" (general preferences, possibly own venue)? If the App is to be left in memory, the client will still load the App and suspend it shortly after. Maybe that's what's causing the problem.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: Yes, that feature is

Quote:
Yes, that feature is turned on (meaning the WUs are kept in memory). Prime Grid had problems with checkpointing for a while, and since this box has quite a lot of memory, it usually isn't a performance problem... so I thought it would be more efficient. I could turn it off if it causes problems.


No, that's not what I meant. I don't think that this setting is causing the problem. It might be that the short time between starting and suspending the App causes trouble on either the App or the OS. For now it's just good to know and may help in further tracking down the problem.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: RE: In addition it

Quote:
Quote:
In addition it will stop trying to immediately sync the checkpoint file after five successive failures (which should help e.g. on XFS). And in contrast to the 4.14 will still keep the checkpoint if syncing failed.

On my XFS systems checkpointing now works (what a relief!). There are no messages about inability to sync, as if it even didn't try to sync at all ...


Are you sure you don't have a EAH_NO_SYNC file left in the BOINC directory?

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

Yes, just minutes ago. The

Yes, just minutes ago.

The message "process got signal 11" is actually not from the App, but from the BOINC Core Client. Either the reason for the segfault is in the Core Client itself, or it is catching the signal meant for the App, in which case it at least prevents any further diagnosis output that might be helpful.

Please try an old Core Client. On my test machines, a 5.4.11 seems to work reliably. At least we should get a better idea of where the segfault comes from.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.