David Anderson and I made modifications to the BOINC scheduler which are designed to resend WU to hosts which have lost them. This only works if you are running a recent client (>=4.45 I think).
Currently any WU which are supposed to be on your machine and which are NOT reported as being there are resent. This is accompanied by a message of the form:
Resent lost result w1_0399.5__0399.6_0.1_T09_S4hA_0
Currently any 'missing' results are sent, even if they are close to deadline.
Please report good and/or bad experiences with this feature in this thread.
Bruce

Ghost WU and resending lost results
)
Walt,
Good catch -- I'm going to have to change your status to 'Developer'!!
Now that you point this out it's obvious that this is how our code works. But it wasn't what I intended. I'll have to fix it, else results will never time out for misconfigured hosts that never get the work.
Any reason that I shouldn't fix this?
[EDIT 10 minutes later]
Walt, I've fixed this. Now when results are resent the 'sent_time' and 'report_deadline' in the database are left unchanged.
[EDIT 5 minutes later]
I wonder if I should update 'sent_time' but NOT 'report_deadline'. This way the result will still time out OK but it'll be obvious from the database that it has been re-sent one or more times. Thoughts??
RE: RE: Perhaps I'm
)
Thank you for this post and the previous one as well. I hadn't realized that when merging hosts, the new 'child' host would get any work that had been sent to the 'parent' hosts, and which was not on the child host.
I intend to watch this thread and 'tweak' the behavior of this re-send mechanism over the coming days. [For example, if the result which would be re-sent is already close to the deadline, I could mark it as an error and generate a new result instead (which would go to some other host).] But I would like to keep this mechanism as simple as possible for the moment, so for now I just plan to 'watch and wait'.
If you have suggestions about changes or refinements to this mechanism, please post them here.
Bruce
I've made an additional
)
I've made an additional change as Walt and I discussed.
For results that are re-sent, the REPORT DEADLINE is left unchanged. However I update the SENT TIME when the result is reset. Thus if
(REPORT_DEADLINE-SENT_TIME) is less than 7 days
it means that the work was resent one or more times.
RE: I haven't had any
)
I see the point. But I'm not sure about this. After all a user can always ABORT a workunit that is problematic, to get rid of it.
RE: RE: RE: I haven't
)
Agreed.
RE: I just got a pile of
)
I suggest that you abort the workunits which can't be finished in time. Then do 'update project' to report the aborted WU to the server. This way, new WU can be issued and your computer won't spend a long time doing work that's overdue.
Any idea how this work got lost??
Cheers,
Bruce
RE: The deduction of a
)
This would have a bad consequence. A host which had a proxy problem and never received a work unit, but which kept contacting the scheduler, would cause that workunit to never finish.
I don't know how to make this determination.
However I have just made the following changes. IF
- Work within 25% of deadline (42 hours for Einstein@Home), OR
- Work no longer needed (Canonical result already exists), OR
- Work unit has error flag set (something wrong), THEN
the scheduler no longer resends the workunit, but instead marks it as timed out in the database. The scheduler will then send an informational message to the client reporting that this WU has been 'expired'.
I'll test this over the next few hours, and see if it has undesirable side effects.
Bruce
Based on the feedback in this
)
Based on the feedback in this forum, I've made some additional modifications to the scheduler policy on resending lost workunits. Details may be found here:
deadline_proposal.txt. This extends the deadlines (up to a total of an additional week) for machines that did not get the work when it was originally sent.
Bruce