Unrecoverable Error...

Anonymous
Topic 12620

> These are the messages I got:
>
> 1. Einstein@Home - 2005-02-20 17:42:15 - Unrecoverable error for result
> H1_0806.9__0807.3_0.1_T03_Test02_0 (CreateProcess() failed - The process
> cannot access the file because it is being used by another process. (0x20))
>
> 2. Einstein@Home - 2005-02-20 17:42:15 - CreateProcess() failed - The process
> cannot access the file because it is being used by another process. (0x20)
>
> 3. Einstein@Home - 2005-02-20 17:42:15 - Deferring communication with project
> for 1 minutes and 0 seconds
>
> 4. Einstein@Home - 2005-02-20 17:42:15 - Computation for result
> H1_0806.9__0807.3_0.1_T03_Test02 finished

Lex, this is a known problem with BOINC. The next version of the BOINC core client should incorporate a fix for it, although the problem itself is *not* well understood.

> There is only one project for E@H running. Was the project deleted? It's not
> under the WORK tab or the TRANSFERS tab. It looks like it's gone. Since the
> time of the attempted file transfer, E@H has been repeatedly asking for more
> work and not getting it:
>
> 5. Einstein@Home - 2005-02-20 17:45:34 - Message from server: No work
> available (daily quota exceeded)

Your system has (unfortunately) generated errors for all the WU that it downloaded. Please wait a day and you'll get some more work. Hopefully this error won't recurr.

Is there anything odd about your system? Are you using anti-virus software? If so, what type?

Cheers,
Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

Unrecoverable Error...

> Is the file indexing service running? I think thats the default in Win2k.
> You can either stop it or change the properties for the BOINC folder to not
> index it - its part of the advanced properties. When you change the option,
> select "applies to this folder, subfolders and files".
>
> Undelete programs also hold on to "deleted" files while they get renamed and
> moved the the "undelete" bin.
>
> If this happens regularly, get href="https://eah.studiodelta.us/%3Ca%20href%3D"http://www.sysinternals.com/ntw2k/source/filemon.shtml">http://www.sysinternals.com/ntw2k/source/filemon.shtml">FileMon[/url] from
> System Internals. Set it to trace file activity for the einstein project and
> run directories by settingto he filter to: "einstein*;slots*" (without the
> quotes) - you get all the file activity for setting up and running each WU.
> And change the options to set "advanced output" and "show milliseconds".

Walter, would the 'file indexing service running' explain these 'CreateProcess()' failures? Our guess was that this was some virus scanning program that had locked the executable or otherwise made it (at least temporarily) unusuable. Our solution was just to retry CreateProcess() a few times with short random sleep(random) in between. If you can provide some theory or explaination for the CreateProcess() failures that would be very helpful.

Bruce

Bruce Allen
Bruce Allen
Joined: 15 Oct 04
Posts: 958
Credit: 170,849,008
RAC: 0

> > > > > > Walter, would

> >
> >
> > Walter, would the 'file indexing service running' explain these
> > 'CreateProcess()' failures? Our guess was that this was some virus
> scanning
> > program that had locked the executable or otherwise made it (at least
> > temporarily) unusuable. Our solution was just to retry CreateProcess() a
> few
> > times with short random sleep(random) in between. If you can provide
> some
> > theory or explaination for the CreateProcess() failures that would be
> very
> > helpful.
> >
> > Bruce
>
> It might. Thats why I suggested running FileMon - the trace shows who does
> what to each file. Its not the indexing service by itself, it appear to be
> the indexing service along with something else that also intercepts filesystem
> calls.
>
> I looked into a similar problem with create_file.xml, where writing a new one
> didn't work because the old one was still there. Even though a delete file
> call was made, the file wasn't actually deleted until a few milliseconds
> later. More detail is in one of the BOINC forums for this problem: href="https://eah.studiodelta.us/%3Ca%20href%3D"http://setiweb.ssl.berkeley.edu//node/">http://setiweb.ssl.berkeley.edu//node/">Couldn't
> Write State file: -109[/url].
>
> In the case of create_file.xml, the following happens - from the programs
> view:
>
> -write client state to client_state_next.xml
> -delete client_state_prev.xml
> -rename client_state.xml to client_state_prev.xml
> -rename client_state_next.xml to client_state.xml
>
> With the indexing service active, trace showed:
>
> -Program wrote client_state_next.xml
> -program deleted client_state_prev.xml
> -system intercepted the call and returned success to the program. But the
> file was not deleted at this time.
> -Program renamed client_state.xml to client_state_prev.xml. This failed with
> a "new name exists" or something like that.
> -system finished deleting the client_state_prev.xml file.
>
> With the indexing service inactive, the trace showed what was expected - the
> call to "delete file" completed before the rename occured.
>
> And by "system", I don't mean Windows doing system level calls on BOINC's
> behalf, I mean that the call is intercepted by another process - system
> process ID 4 - and new filesystem operations performed on that file before it
> completes the "delete". On my system the interceped calls are still performed
> properly, but I don't have virus scanners intercepting everything either. Its
> apparent from the differences in the two traces (mine and the one with the
> problem) that the calls are intercepted twice - once by the indexing service
> and most likey the second time by the virus scanner.
>
> Suggestion for getting traces and interpreting them:
>
> Get FileMon from the System Internals site, install it and set filtering as in
> my past message. When you get the problem, save the FileMon trace and make a
> note of the timestamps for the problem.
>
> Disable the indexing service and the virus scanner. Or run the trace on
> another system that doesn't have any of those services running, and no
> undelete, auto backup, add-blockers or anything like that. Run Filemon again
> to see what "normal" file operations look like.

Walter, thank you for the suggestions. I am hoping that we can reproduce these problems or alternatively have a user who sees these errors do some detective work as you describe.

Cheers,
Bruce

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.