ATLAS issues

Anonymous
Topic 13732

We had some problems on ATLAS since the failure of the cooling unit on Sunday. Currently Einstein@home is not running on ATLAS and will be brought back up slowly during the next days.

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

ATLAS issues

Quote:

I think I can remember the story about dying servers, and I think it affected the older Merlin cluster (> 5 years old), which is made from Dual Athlon MP systems (K7 architecture) in desktop cases. The newer Morgane Cluster is made from K8 Opterons in 19" cases IIRC and should not yet show that many signs of aging.

CU
Bikeman


After the last power outage @AEI Potsdam a few weeks ago Merlin was no longer reactivated. It is actually dead now. I think it had less than 50 nodes left running of its original 180.

Morgane is running well, about half a dozen (of 615) nodes are down for hardware failures, that's all.

[edit]Looks like most of the failed nodes have been repaired, only one seems to be down. Learn more about the AEI clusters at gw.aei.mpg.de.[/edit]

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

For the curious:

For the curious: Einstein@home has been stopped on ATLAS again, apparently there is a bug in the current version of Condor (the job scheduler on LSC clusters) that prevents ordinary jobs from being started on a node while Einstein@home is running there.

BTW: at Ganglia you can get an overview of the status of the LSC clusters (including Nemo, Morgane and ATLAS). "nice" usually means Einsten@home.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.