S5R1 and beyond

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820
Topic 13522

This is a short status update. All of us have been quite busy, as you probably can imagine, trying to fix all kinds of problems, and we still are.

- Today we generated the last Workunit of S5R1. All that remains to do of that run is to crunch the remaining Workunits that are already in the database and for which no canonical result has been found yet.

[edit:] - There are probably only a small number of tasks remaining for every frequency band, which causes hosts to download a new datafile for almost each task. Dial-Up users may want to suspend the project for the next few days.

- A lot of problems we had recently, in particular the database problems, seem to have come mostly from the fact that near the end of S5R1 much more short Workunits were left, so they came in at a much higher rate than we expected. With the end of S5R1, things should be back to normal again.

- We are currently testing the setup for a new run that will look again into a smaller frequency range of the current S5R1 dataset with modified parameters (spindown and mismatch). We hope to start distributing this new workunits in the next days, so there should not be much of a gap to the S5R1 run. This run will last 2-3 months. It will consist of only one type of workunits that are a bit more than half as long as the S5R1 long ones have been.

I hope to have time to post some more info here as soon as it becomes available.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

S5R1 and beyond

Quote:
to get needed time, take expected time, multiply it by 2 and switch over to next larger time unit


The problem with the original Westheimer's rule is that it's recursive...

No, seriously:

- We have started distributing Work of a run called S5RI this morning
- Lasting longer than the short Workunits of S5R1 this will lower the load on our database server, so things should go back to a more or less normal state from now (and already are...)

Actually the situaton went pretty bad because of a number of issues that happened at the very same time:

- hardware problems with the fileserver, causing delayed and thus accumulated reports
- S5R1 was coming to an end, with almost only short workunits left
- faster machines have been added after X-Mas :-)
- Bruce was (and in some sense still is) moving with his family from Milwaukee to Hannover, which means that everything at UWM was on David's shoulders
- currently due to the storm warnings in Germany facilities are shutting down and people are sent home, so we'll see how things go today

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: You might need to

Quote:
You might need to delete your app_info.xml-file. If this file is in your projects/einstein-directory it will prevent the new version to be downloaded.
After deletion, restart Boinc and the new WU's start coming :-)


There are platforms that require to run "anonymous" Apps. I'm sticking together some new app_info.xmls for them to get the new work.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: J18/01/2007

Quote:
J18/01/2007 14:02:27|Einstein@Home|Message from server: Server can't open database


Yep, still a bit rough road.

The latest performance issues were due to all validators running at full load to check the results that managed to come in now...

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: I know now that You all

Quote:
I know now that You all do YOu best to fix and solve the current situation, but to NOT disapointing the crunchers, it would be good that the news are regular updated. The last Info is from Jan. 7. :-( !


I'd appreciate that, too. It seems that the people with access permissions to do so are offline, probably getting some well-deserved sleep.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

It looks like a wrong version

It looks like a wrong version of the validator had been installed.

The one responsible for this had been found and shot. Now there's no way to fix it anymore.

Seriously: Bruce will be the first awake with the permissions to fix it, so this should be cured tomorrow morning (CET). All results that have been marked "validate error" should be validated again, probably there's nothing wrong with them.

Sorry for the inconvenience, we're all a bit short on sleep.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: First successful S5RI

Quote:
First successful S5RI results, are validating at 50% higher credit/hour than the s5R1 units they're replacing.

The first few hundred Workunits have been accidentally generated with a higher credit (factor was 1.6 IIRC). We thought it wasn't worth the hazzle to manually dig them out of the DB and fix it. Seems you were just lucky. Credit should be back to what you expect from S5R1 with later charges of WUs.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: I've noticed that the

Quote:
I've noticed that the (long) S5RI WUs seem to have a complicated command line. Am I wrong in thinking that that is just a kludge to force the WUs to do more work than is necessary in order to extend the time it takes to do a WU?


Yes you are.

I don't know why the command line looks more complicated to you than the ones of S5R1. We are using a newer framework for our workunit generator, which may result in more options given on the command line than being hidden in the config file or in program defaults, but in priciple the program shouldn't do something different.

Quote:
I'm imagining that that was done only to lessen the stress on the servers caused by the short return times of the short(er) WUs. Or, are we now re-crunching interesting WUs using a higher degree of sensitivity?


Not really with a higher sensitivity, which would be something like a closer look. We're rather looking at a certain part from a different angle, or with a different focus, but from the more or less same distance. We found that the spindown values we were looking for in S5R1 might not have been optimal for this frequency range (150-720Hz, I think), so we've changed that for this short run.

Originally the workunits resulting from this setup were a bit longer than the long S5R1 WUs, so we decided to cut them in a half to not exclude the slower computers.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: Upon closer inspection,

Quote:

Upon closer inspection, ie. I put my reading glasses on, my work units are labelled eg. 'h1_0374.0_S5R1__1503_S5RIa_0' - or spoken 'aych one underline zero three seven four point zero underline ess five arr ONE underline underline one five zero three underline ess five arr EYE ay underline zero' :-)

So that'd make the 'S5RI' units a subset of 'S5R1' .....

The first part of a Workunit is just the name of the datafile it refers to. As we are using the same data files, they are still labeled S5R1, even if the workunit belongs to S5RI. And yes, in terms of the frequencies we're looking at S5RI is a subset of S5R1.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: It looks like a wrong

Message 3092 in response to message 3087

Quote:

It looks like a wrong version of the validator had been installed.

Seriously: Bruce will be the first awake with the permissions to fix it, so this should be cured tomorrow morning (CET). All results that have been marked "validate error" should be validated again, probably there's nothing wrong with them.


Seems I was wrong - David has replaced the validator with the proper version.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: So will it never get

Quote:
So will it never get back online? Too bad, i really liked that page a lot. I got a feeling that we were getting somewhere. So is it only temporarily or permanent?


We'll definitely put it back online once the database problems have been solved.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.