Work Unit not finishing

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

> @Bernd or any def people >

> @Bernd or any def people
> here got a longrider.

T(om?)hanks!

> Still interested about the boinc folder?

Thanks for doing, but I think we found the problem. The WUs causing this kind of trouble seems to be analyzing the frequency range around 60Hz (you can currently tell it from the name). We are working to get this problem out of the way.

If you don't mind, keep the archive for a while in case we need it, I think right now it is of no use for us, but may become handy in the future.

Thanks a lot for your help!

BM

BM

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

> Is this related to my CPU

> Is this related to my CPU time resetting back about an hour whenever einstein
> was paused and resumed by BOINC? Is there maybe no checkpoint built into the
> 3rd stage of analysis since it was expected to be so short? I have my
> settings set to switch between projects every 60 minutes and to remove them
> from memory when doing so.

Your analysis of the problem is correct. There is no checkpoint in the third stage of processing because it is supposed to take only a few seconds.

Please see the front page news item about this.

Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

> And it a pleasure to help

> And it a pleasure to help in a modest way

This was a big help. The problem has been isolated and we're working on a fix. Please see the front page news item about this.

Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

> In case this helps

Message 2022 in response to message 2018

> In case this helps anyone:
>
> The problem appears to be in the data, not in the code. The WU will eventually
> finish, but it may take quite some time (even more than we expect in the max
> CPU time value and exceeding the deadline), and maybe also more memory than we
> expected (possibly causing more problems). We'll ty to avoid such WUs in the
> future.

I want to say this a bit differently.

The problem is in our code. For certain data sets, it is not as efficient as it could or should be. So we are in the process of fixing the code to make it efficient in all cases.

Unfortunately we can't identify the 'troublesome' data sets or cases without actually analyzing the data! So we can't easily avoid such WU. Instead we need to make the code work efficiently with any of our input data sets.

Cheers,
Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

> Another not ending story in

> Another not ending story in "H1_0059.9__0060.0_0.1_T17_Test02" If you can
> delete this workunit the other will thank you.

For what its worth, the problem occurs when analyzing the band
of data containing 60 Hz (power mains frequency in the USA).

I'll talk with others in our team about cancelling these WU. I am not doing it right away because looking at the results coming back (they are slow but do compplete) may help us to fix this.

Cheers,
Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

> 0060.0_ So are you saying

> 0060.0_ So are you saying if I look at this part of the WU ID that there could
> be a potential problem with the WU taking to much time to finish ... ???

Actually it's the __0060.0 (TWO underscores) that's the clue. I wouldn't be surprised if the __0059.9 (TWO underscores) workunits also show this behavior.

Bruce

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

> Is there a transmission at

> Is there a transmission at 74Hz as well? ;)

Well, IMHO everything below 100Hz may cause problems with the current data sets.

We are working on new apps and different data pre-processing to solve this problem.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

Just to give you an update on

Just to give you an update on this issue:

1. We have identified the problem and are working on it. A new set of apps fixing this should be availble in the next days.

2. We still appreciate the uploaded process directories (slots). They will help up to test these new apps before releasing them.

3. If you see such a "never ending result" apparently staying at 100% for hours, pleaase do the following:

- report it to us, probably in this thread
- if you can, zip (or tar.gz) the appropriate slots directory (and maybe the projects/einstein directory, too, but that's not so important) and make it available to us
- if you are only running E@H, just be patient - the WU will spend quite some time at the 100% mark, but will eventually finish
- if you are swapping between different projects and have set to remove the app from memory when suspended, the Result probably will never finish. The reason is that the app is not writing any checkpoints during this last stage, always gets suspended before completing it and thus starts at the bginning of this phase over and over again.
If you have a (experimental) client that allows to suspend individual projects (and aborting individual results), suspend all other projects except E@H until this Result is finished. You may also abort this individual result if you don't want to affect the other projects, but will lose you CPU time spent on it then.
If you are using a stock client (4.1x) without this possibility, well, there may be no other way than resetting the E@H project to get his Result out of the way, losing the CPU time you spent on it (and causing another 11MB data file download).
It may help to modify your preferences for a longer swap time and/or to keep the app in memory, but I'm not sure if it helps once the app is in this stage.

Thanks a lot for you help!

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

ric, Blizzard, I found the

ric, Blizzard,

I found the uploads from Rebirther (to test our new apps with them), but nothing from ric or others - have they already been deleted?

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

ric, Blizzard, I found the

ric, Blizzard,

I found the uploads from Rebirther (to test our new apps with them), but nothing from ric or others - have they already been deleted?

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.