Still a problem with the “enforce_delay_bound� option. Message from server: No work sent (won't finish in time)

Anonymous
Topic 12742

Here is the view of the first scheduler interaction, from the point of view of the server:

2005-02-26 13:29:19 [normal ] OS version Microsoft Windows 98 , (04.10.1998.00)
2005-02-26 13:29:19 [normal ] Request [HOST#5349] Database [HOST#5349] Request [RPC#27] Database [RPC#26]
2005-02-26 13:29:19 [normal ] Processing request from [USER#2042] [HOST#5349] [IP 213.112.125.121] [RPC#27] core client version 4.19
2005-02-26 13:29:19 [debug ] [HOST#5349] Resetting nresults_today
2005-02-26 13:29:19 [normal ] [HOST#5349] got request for 1196.786825 seconds of work; available disk 0.908115 GB
2005-02-26 13:29:19 [debug ] [HOST#5349]: has file H1_0953.4
2005-02-26 13:29:19 [debug ] in_send_results_for_file(H1_0953.4, 0) prev_result.id=1264095
2005-02-26 13:29:19 [debug ] est cpu dur 100426.666667; running_frac 0.119929; rsf 1.000000; est 837382.729324

The estimate is that one WU would take 100426 seconds (say 28 hours) of CPU on your machine. But since the estimate is that the code would only run for 11% of the time, the work was estimated to take 837382 seconds, whereas the deadline is a week, which is less than this.

2005-02-26 13:29:19 [debug ] [WU#403228 H1_0953.4__0953.7_0.1_T06_Test02] needs 837382 seconds on [HOST#5349]; delay_bound is 604800 (request.estimated_delay is 0.156596)

Here's an odd thing: your machine is estimated to still have 0.15 seconds of remaining work for E@H. If it had NO remaining work for E@H, then it would have gotten more work.

2005-02-26 13:29:19 [normal ] [HOST#5349] Sent 0 results
2005-02-26 13:29:19 [debug ] [HOST#5349] MSG(high) No work sent
2005-02-26 13:29:19 [debug ] [HOST#5349] MSG(high) (won't finish in time) Computer on 34.6% of time, BOINC on 34.6% of that, Einstein gets 100.0% of that
2005-02-26 13:29:19 [normal ] sending delay request 3600.000000

Here's the later scheduler logic (when there was no remaining E@H work on your machine):
2005-02-26 13:45:18 [normal ] OS version Microsoft Windows 98 , (04.10.1998.00)
2005-02-26 13:45:18 [normal ] Request [HOST#5349] Database [HOST#5349] Request [RPC#28] Database [RPC#27]
2005-02-26 13:45:18 [normal ] Processing request from [USER#2042] [HOST#5349] [IP 213.112.125.121] [RPC#28] core client version 4.19
2005-02-26 13:45:18 [normal ] [HOST#5349] [RESULT#1264095 H1_0953.4__0953.5_0.1_T03_Test02_0] got result
2005-02-26 13:45:18 [debug ] cpu 142079.230000 cpcs 0.000804, cc 114.191635
2005-02-26 13:45:18 [debug ] [RESULT#1264095 H1_0953.4__0953.5_0.1_T03_Test02_0]: setting outcome SUCCESS
2005-02-26 13:45:18 [normal ] [HOST#5349] got request for 1197.675012 seconds of work; available disk 0.908115 GB
2005-02-26 13:45:18 [debug ] [HOST#5349]: has file H1_0953.4
2005-02-26 13:45:18 [debug ] in_send_results_for_file(H1_0953.4, 0) prev_result.id=1264095
2005-02-26 13:45:18 [debug ] Sorted list of URLs follows [host timezone: UTC+3600]
2005-02-26 13:45:18 [debug ] zone=+3600 url=http://einstein.aei.mpg.de
2005-02-26 13:45:18 [debug ] zone=-21600 url=http://einstein.phys.uwm.edu
2005-02-26 13:45:18 [debug ] [HOST#5349] Sending app_version einstein windows_intelx86 479
2005-02-26 13:45:18 [debug ] [HOST#5349] Already has file H1_0953.4
2005-02-26 13:45:18 [debug ] [HOST#5349] reducing disk needed for WU by 14736000 bytes (length of H1_0953.4)
2005-02-26 13:45:18 [debug ] est cpu dur 100426.666667; running_frac 0.120096; rsf 1.000000; est 836218.454917
2005-02-26 13:45:18 [normal ] [HOST#5349] Sending [RESULT#1429554 H1_0953.4__0953.7_0.1_T06_Test02_3] (fills 836218.45 seconds)
2005-02-26 13:45:18 [normal ] [HOST#5349] Sent 1 results

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

Still a problem with the “enforce_delay_bound� option. Messa

> The estimate for the crunch time of the WU is wrong, but it is to my advantage
> so it isn’t the problem here. That only leaves the estimate of how much
> CPU-time BOINC will get in a day. If i understand it correctly the estimate of
> the available CPU-time (running_frac) is calculated by multiplying “the
> percentage of the day computer is on� with “the percentage of CPU-time
> available to BOINC then it’s on�.
>
> In my case it is:
> Computer on 34.6% of time
> BOINC on 34.6% of that
>
> Now this looks suspicious to me. Both values are 34.6%. Since we know
> something is wrong, my guess would be that one of these values is a copy of
> the other.

I'll ask David Anderson (who wrote this bit of the scheduler code) about this.

> That the Computer is on 34.6% of time seams fairly accurate. My
> guess would be something lower then 50%. That BOINC would only have access to
> the CPU 34.6% of time seams very low to me. This computer is manly used to
> surf the web, reading BOINC message boards and to listen to internet-radio
> through winamp (18% CPU use). So i would expect a value around 80%.

Are your preferences set so that BOINC runs all the time? Or just when your computer is idle? In the latter case, how long does it have to be idle before BOINC restarts the work?

> The reason my machine have 0.15 seconds of remaining work for E@H is because i
> have set the “contact server every� value to 0.02. Then running
> application version 4.75 the client downloaded a new WU 5 min before the old
> finished (value 0.01). With application version 4.79 it seams that the
> calculation of remaining time is not so accurate for the last minutes. I think
> i will try 0.05 the next time.

OK, I think that makes sense.

Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

> In 4.1x clients active

> In 4.1x clients active fraction is the percent of total time the client can
> process data. In 4.2x clients the active fration is the percent of on time
> that the client can process. So the server report is acurate for the newer
> client but not correct for the current client. This was confusing me too,
> David posted the definitive answer on the bug tracking board.

John, URL please for this definitive answer, please?

I'll need to modify the scheduler to fix this. The current scheduler calculations of active time/on time are not done differently for different versions of the BOINC core client.

Cheers,
Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

OK, I just talked with David

OK, I just talked with David Anderson about this.

He was under the mis-impression that for clients

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

> > > In 4.1x clients active

> > > In 4.1x clients active fraction is the percent of total time the
> client
> > can
> > > process data. In 4.2x clients the active fration is the percent of
> on
> > time
> > > that the client can process. So the server report is acurate for
> the
> > newer
> > > client but not correct for the current client. This was confusing
> me
> > too,
> > > David posted the definitive answer on the bug tracking board.
> >
> > John, URL please for this definitive answer, please?
> >
> > I'll need to modify the scheduler to fix this. The current scheduler
> > calculations of active time/on time are not done differently for
> different
> > versions of the BOINC core client.
> >
> > Cheers,
> > Bruce
> >
> URL= http://bbugs.axpr.net/bug.php?op=show&bugid=62
>
> It was the last of the comments that I was referring to.

I've fixed the E@H scheduler. It now sets active_frac to 1 for core clients 4.19 and earlier.

Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

> >I've fixed the E@H

> >I've fixed the E@H scheduler. It now sets active_frac to 1 for core
> clients
> >4.19 and earlier.
> >
> >Bruce
>
> It would have been better to set on_frac to 1. That way it would adjust
> properly for clients that run when idle only or otherwise spend significant
> amounts of time not crunching but on. However either way is better than it
> was.

John, I talked to David Anderson specifically about this and his conclusion was to set active_frac to 1. Have you read the relevant client and server code? If you have, and you are sure, I can take it back up with David. I don't understand this part of the code.

Cheers,
Bruce

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.