ATLAS issues

Anonymous

23 Dec 2008 18:42:44 UTC

Topic 13732

(moderation:

)

We had some problems on ATLAS since the failure of the cooling unit on Sunday. Currently Einstein@home is not running on ATLAS and will be brought back up slowly during the next days.

Bernd Machenschalk

Joined: 15 Oct 04

Posts: 2,684

Credit: 25,950,161

RAC: 34,820

ATLAS issues

29 Dec 2008 0:06:48 UTC

Message 3663

(moderation:

)

Quote:

I think I can remember the story about dying servers, and I think it affected the older Merlin cluster (> 5 years old), which is made from Dual Athlon MP systems (K7 architecture) in desktop cases. The newer Morgane Cluster is made from K8 Opterons in 19" cases IIRC and should not yet show that many signs of aging.

CU
Bikeman

After the last power outage @AEI Potsdam a few weeks ago Merlin was no longer reactivated. It is actually dead now. I think it had less than 50 nodes left running of its original 180.

Morgane is running well, about half a dozen (of 615) nodes are down for hardware failures, that's all.

[edit]Looks like most of the failed nodes have been repaired, only one seems to be down. Learn more about the AEI clusters at gw.aei.mpg.de.[/edit]

Bernd Machenschalk

Joined: 15 Oct 04

Posts: 2,684

Credit: 25,950,161

RAC: 34,820

For the curious:

6 Jan 2009 19:52:17 UTC

Message 3664

(moderation:

)

For the curious: Einstein@home has been stopped on ATLAS again, apparently there is a bug in the current version of Condor (the job scheduler on LSC clusters) that prevents ordinary jobs from being started on a node while Einstein@home is running there.

BTW: at Ganglia you can get an overview of the status of the LSC clusters (including Nemo, Morgane and ATLAS). "nice" usually means Einsten@home.

ATLAS issues

Forums › Cruncher&#039;s Corner

ATLAS issues

For the curious:

Comment viewing options

Forums › Cruncher&#039;s Corner

Forums › Cruncher's Corner

Forums › Cruncher's Corner