"Project is down" for 19 hours now

Anonymous
Topic 13755

We still have serious problems with our database and are working on it. For the moment we disabled the scheduler (which was getting mostly DB connection errors anyway) to let the rest of the daemons catch up.

ABP1 validator and assimilator are running, however they will never again run on the machine the server status page has direct access to, so I removed the status signs from there.

Also due to DB connection problems the server status page isn't updated, but still shows an old status of 10:37 UTC.

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

"Project is down" for 19 hours now

Ok, the daemons have worked through the backlogs and the DB looks responsive again. I started the scheduler, let's see how things go.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: I haven't seen any

Quote:
I haven't seen any description yet of the nature of the additional load that the database is experiencing.


Surely a number of things added together, we didn't spent much time to investigate this, but we surely will.

Under heavy load finally our old DB server crashed and we couldn't bring it back online for more than a last dump (that itself took several hours). So we stuffed together the hardware we could find to a new, larger server (32G RAM) that is running now. While the additional memory surely helped, the performance of this new DB is still poor. We are looking into this and already solved some issues. For now the project has been stopped again to run some offline analysis, indexing and other optimization on the database.

We'll continue to fix the problems that occur, so far we had all kinds of things, ranging from hardware failures to bugs in software not written by us, last one fixed just minutes ago, all in just a few days.

I strongly intend to have the project up and running later today.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

Status: Good: Database is

Status:

Good:
Database is running more smoothly than ever, daemons (transitioner, assimilator, validator) are working through their backlogs, scheduler seems to send out work.

Bad:
All related to ABP1 is currently offline, because for some reason the connection to Hannover isn't stable. For some other reason the S5R5 workunit generator is running too slow to keep up with work requests, so the project might run out of work some time over the weekend, unless someone fixed these issues.

But after some 12 hours of hard work Oliver and I are too tired to take much care of this, we'll possibly do more damage than good. Need to get some sleep.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

Update: * We found a

Update:

* We found a workaround for the problems with the S5R5 WUG (WU Generator), so we probably won't run out of work in the near future.

* We also found a workaround to run the ABP1 validator and assimilator, the WUG is still causing some headache.

* We needed a couple of workarounds to bring the project back online, which will probably require a scheduled downtime for a few hours some time next month to remove them. But for now we're getting back on track.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: RE: Update: * We

Quote:
Quote:

Update:

* We needed a couple of workarounds to bring the project back online, which will probably require a scheduled downtime for a few hours some time next month to remove them. But for now we're getting back on track.

BM


Does that remaining work include communications / connection to say : boincstats.com and statsnstones.tswb.org ?
I've noticed that they haven't been updating Einstein data for a while now. So when can we expect that to happen ?
Or am I wrong and it isn't an Einstein issue but something related to mentioned statistics sites ?

regards
Vagn


I just turned the dumps back on.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.