No work available (daily quota exceeded)

Anonymous
Topic 12582

> 1. Today I returned from a business trip, see that I dont calculate 5 WU ( the
> office is closed for weekend ) and reset project...
> After that:
> Requesting 194233 seconds of work
> Sending request to scheduler: Scheduler RPC to
> http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
> Message from server: No work available (daily quota exceeded)
> I dont have WU in BOINC client, but in "Results for computer" I have 10 NEW
> WU?
>
> Idea?!
>
> 2. About WU after reset or Deatch project?
>
> I thik, that is sensibly WU from queue is free "not sended" ( send to another
> computers )?
>
> Sorry for my english ;o(

I'm not sure what's happened in your case. I can see a request
for 173645.179355 seconds of work, followed by
2005-02-18 10:41:11 [normal ] [HOST#7664] Sent 3 results

then an additional result coming from a request five hours later.

2005-02-18 15:44:36 [normal ] [HOST#19083] Sent 1 results

Please could you copy and paste your "message log" here?

Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

No work available (daily quota exceeded)

> In the questions and problems area,
>
> other users reported this problem just some days ago.
>
> nobody found the time to put some statements :--((((
>
>
> look to ">this thread[/url]

The last posting in that thread was Feb 8th. I've done a lot of work on the BOINC scheduler since then to try and ensure that all users get at least *some* work, but no so much that they can't meet the deadlines. In particular the BOINC scheduler now sends back messages to the BOINC client to try and indicate exactly *why* it is not sending work. Hopefully, when your machine is not sent work, this will help you to understand why. If not, please post a message (a transcript/log copied from your BOINC client would be very helpful).

Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

> 2005-02-19 15:20:48

> 2005-02-19 15:20:48 [Einstein@Home] Message from server: No work available
> (daily quota exceeded)

Server records show that your machine has TEN workunits to process. Is this not true?

Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

> just now. No reason, just

> just now. No reason, just says no work, sometimes fails at a ping.
>
> I don't know if this means there's a problem with the scheduler...
>
> Einstein@Home - 2005-02-19 08:59:54 - Resuming computation for result
> H1_0130.4__0130.7_0.1_T07_Test02_6 using einstein version 4.79
> --- - 2005-02-19 08:59:54 - Insufficient work; requesting more
> Einstein@Home - 2005-02-19 08:59:54 - Requesting 14209 seconds of work
> Einstein@Home - 2005-02-19 08:59:54 - Sending request to scheduler:
> http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
> Einstein@Home - 2005-02-19 08:59:58 - Scheduler RPC to
> http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
> Einstein@Home - 2005-02-19 08:59:58 - Project prefs: using separate prefs for
> home
> Einstein@Home - 2005-02-19 08:59:58 - No work from project
>
> Einstein@Home - 2005-02-19 09:03:33 - Sending request to scheduler:
> http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
> Einstein@Home - 2005-02-19 09:03:37 - Scheduler RPC to
> http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi failed
> Einstein@Home - 2005-02-19 09:03:37 - No schedulers responded
> Einstein@Home - 2005-02-19 09:03:37 - Deferring communication with project for
> 1 minutes and 0 seconds
>
> Einstein@Home - 2005-02-19 09:04:37 - Sending request to scheduler:
> http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
> Einstein@Home - 2005-02-19 09:04:41 - Scheduler RPC to
> http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
>
> addendum
>
> I must say it's sometime hard to get work, the response from the scheduler is
> no work available, no reason, the last completed wu I had to go 100% before
> I got 1 wu back
> then it asks for "Requesting 14209 seconds " My queue is normally 2 wu's

The scheduler records indicate that you currently have one WU. I don't know why you are not getting more. It could be that your machine is slow, or not available enough of the time.

Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

> I've observed on my three

> I've observed on my three machines, as well as many other as I browse through
> the results, that some work units are listed as downloaded but are not
> actually on the machines indicated.
>
> EXAMPLE:
>
> 1133186 337376 14 Feb 2005 17:49:21 UTC 21 Feb 2005 17:49:21 UTC In Progress
> Unknown New --- --- ---
>
> Log entries for this machine do not list this result as downloaded.

Michael, good to know. Sounds like this is another BOINC bug to track. Please, take a look throught the sched_request.xml and sched_reply.xml files on your machine and see if there is any record of the WU in there.

Cheers,
Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

Ric, It looks like your

Ric,

It looks like your machine host 18760 is ttrashing all WU (10 per days). This is why you are bumping up against the daily result quota. This may be because you are using the buggy core client 4.56. I suggest that you stop the project on that machine, abort all existing WU, clear out the einstein-related stuff from that machine, install a recent (or stable) client, and reset the project.

Cheers,
Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

> > > addendum > > > > > > I

> > > addendum
> > >
> > > I must say it's sometime hard to get work, the response from the
> > scheduler is
> > > no work available, no reason, the last completed wu I had to go
> 100%
> > before
> > > I got 1 wu back
> > > then it asks for "Requesting 14209 seconds " My queue is normally 2
> wu's
> >
> > The scheduler records indicate that you currently have one WU. I don't
> know
> > why you are not getting more. It could be that your machine is slow, or
> not
> > available enough of the time.
> >
> > Bruce
>
> I have a P4, 2GHz and right now it's the only thing running on it ( Seti (25%)
> & LHC (37.5%)are down.) The only unusual situation is that, I can't upload
> 2 seti results so they are in the queue for uploading. Normal process time for
> your wu's on my machine is 11.5 hours .( no changes were made in the options
> for this to happen, the only one was the application upgrade to 4.79)
>
> I have one wu ( got that one this morning) working since this morning 9:05hr
> done,2:40hr left
> still won't give me another wu.
>
> Situation just now !
> --- - 2005-02-19 17:46:35 - May run out of work in 2.50 days; requesting more
> Einstein@Home - 2005-02-19 17:46:35 - Requesting 45384 seconds of work
> Einstein@Home - 2005-02-19 17:46:35 - Sending request to scheduler:
> http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
> Einstein@Home - 2005-02-19 17:46:38 - Scheduler RPC to
> http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
> Einstein@Home - 2005-02-19 17:46:38 - No work from project
> Einstein@Home - 2005-02-19 17:46:38 - Deferring communication with project for
> 4 hours, 16 minutes, and 52 seconds
>
>
> It's no big deal for me, but maybe an indication of something wrong
> somewhere.

Here's the scheduler log:

2005-02-19 22:46:37 [normal ] Processing request from [USER#4668] [HOST#5918] [IP 66.48.170.129] [RPC#85] core client version 4.19
2005-02-19 22:46:37 [normal ] [HOST#5918] got request for 45383.549629 seconds of work; available disk 2.000000 GB
2005-02-19 22:46:37 [debug ] [HOST#5918]: has file H1_0130.4
2005-02-19 22:46:37 [debug ] in_send_results_for_file(H1_0130.4, 0) prev_result.id=1207345
2005-02-19 22:46:37 [debug ] est cpu dur 39375.000000; running_frac 0.116258; rsf 0.375000; est 903159.807114
2005-02-19 22:46:37 [debug ] [WU#353174 H1_0130.4__0130.9_0.1_T28_Test02] needs 903159 seconds on [HOST#5918]; delay_bound is 604800 (request.estimated_delay is 77060.463747)
2005-02-19 22:46:37 [normal ] [HOST#5918] Sent 0 results
2005-02-19 22:46:37 [normal ] sending delay request 15412.092749

The scheduler estimates that one WU would take 39375 secs of CPU time. However since your machine is only active 11% of the time, and the E@H project has only 1/3 of the resource share fraction, the estimated wallclock time to complete this is 903,000 seconds. So it would not get done before the deadline.

The REAL problem here is that you *SHOULD* get a message explaining why you are not getting work. I want to understand why not. Could you please look in the BOINC directory, at the file called sched_reply.xml. Does this contain a line starting with an XML "message" tag? If so, could you please cut and paste that message here for me to see? Please wait until you get another "No work from project" response, then look at sched_reply.xml.

Thanks!

Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

> Thanks for posting Bruce, >

> Thanks for posting Bruce,
>
>
> as mentioned, normaly not having troubles with this host.
>
> Followed the advise to migrate.
>
>
> > It looks like your machine host 18760 is ttrashing all WU (10 per days).
>
>
> Thats an other one, pentium 3.2 w2k HT mode.
>
> I will take a look why, but guessing the host runs at a to high speed.

Tell me which host is the one not getting enough work. Be sure to read the 'no work' message completely. It may extend off to the right -- you may have to extend the messages box to see it!

Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

Ric, > > The scheduler

Ric,

> > The scheduler estimates that one WU would take 39375 secs of CPU time.
> However since your machine is only active 11% of the time, and the E@H
> project...
>
> My clients indicate the following values. The problem is, they have been up
> and running Boinc 24/7 (other than short maintenance cycles totaling no more
> than 3 hours) with E@H as the only project since I signed up on January 22.
> Why such low percentages?
>
> Computer ID 7660
> % of time client is on 63.8722 %
> % of time host is connected 63.8722 %
> % of time user is active 63.87 %
>
> Computer ID 7668
> % of time client is on 65.2571 %
> % of time host is connected 65.2571 %
> % of time user is active 65.2491 %
>
> Computer ID 7672
> % of time client is on 64.2925 %
> % of time host is connected 64.2925 %
> % of time user is active 64.2797 %

My note was to a DIFFERENT person: the owner of host 5918, not one of your machines!

> Are there any resolutions to the missing WU problems mentioned earlier in this
> thread?

Yes, I think so. There were two problems:

(1) Some workunit command_line + tag combinations in sched_reply were more than 256 chars long. This seems to have caused chaos with 4.19, which is supposed to handle these OK but apparently didn't.

(2) There is a bug in 4.19 that appears when used with a proxy server. Replies sent from the scheduler to the host machine never reach the host machine.

Problem (1) was fixed by me on the server side late Sunday night (OK, truth is, 4am Monday morning).

Problem (2) is fixed in new core client release.

> And finally, I would like to thank you and your staff for the interest
> you’ve paid to the issues and questions posed by the E@H participants. It's
> refreshing to have direct and timely contact with the project developers.
> KUDOS to all of you!

Thank you. In turn let me thank you and some of our other participants for YOUR time and patience. Unfortunately we just don't have the resources to do extensive testing 'in the lab' so the feedback that we get is the only way to track down and resolve the problems that come up.

Cheers,
Bruce

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.