Hopefully, just an OOPS!! moment ....

Anonymous
Topic 13636

Quote:

Has anyone noticed the shenanigans going on at the top of the Top Computers list? Take a look at this supercomputer. A RAC of 382K (currently anyway) achieved on its opening day is not too shabby :). Also, 10K+ tasks in its tasks list - WOW!! :).

And it didn't happen once only!! Somebody stuffed up a few times by the look of things. The top of the list used to be #1 - Peanut and #2 - Akosf. Those two have (temporarily) been rather displaced :).

I wonder if anyone will "fess up" as to exactly what happened :).


I contacted the participant in question. It turned out that accidentally the hostid was shared on a cluster of ~50 machines.

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

Hopefully, just an OOPS!! moment ....

Quote:
That shouldn't be possible, should it? How on Earth do you share one hostID among a cluster, other than to constantly copy the whole BOINC directory to other computers in the cluster, wait for them to crunch the tasks, copy the whole directory back to the original computer, etc.? As that's hardly accidental. ;-)


You can throw up a 50-nodes job thinking that the BOINC directory is local to the machine but in fact is shared via NFS. This would require some tinkering with the lockfiles / NFS cache settings, but it could work.

Actually the user copied a BOINC directory to 50 machines and then apparently tweaked the client_state.xml file but forgot to change the hostid.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: RE: Actually the user

Quote:
Quote:

Actually the user copied a BOINC directory to 50 machines and then apparently tweaked the client_state.xml file but forgot to change the hostid.

BM


Um.. if all those machines return all the same tasks, for the one hostID, shouldn't the Einstein server then throw up a flag, saying "Hey, that hostID already returned those tasks!" ??

I'm not even going into the fact that the original hostID says it has 8 CPUs, was re-registered yesterday yet has close to 9,000 tasks, all running between 1 and 8.5 hours. Perhaps Juvian days are used. Or Plutonian. ;-)

You sure he isn't using the [trac]wiki:SuperHost[/trac] idea?


I'm sure he was trying his own way of implementing a poor-man's version of it.

I don't precisely know how this happened, but I also don't know how exactly he messed up the client_state.xml file, and I surely don't have time to dig deeper into it.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.