Exit code -1073741819 (0xC0000005): This is the famous "General Access Violation". There a numerous reasons for this error to occur, from hardware problems to graphics drivers and more. Ideally when this happens, the "Windows Runtime Debugger" should start up and write a stack dump to stderr out that helps to further diagnose the problem. If the "Access Violation" is listed in the "*** Dump of the Graphics thread ***", it was almost certainly a problem with the graphics driver. The most common cause for an Access Violation listed in the "*** Dump of the Worker thread ***" shows "houghmap.c:" near the end of the first line of the Callstack. This is a problem that we might be able to do something about, we are currently hunting this. Apparently it only happens on certain machines, it might be related to the data these hosts are processing, but may also be a property of the hardware or other software on the system.
Exit code 10: It means that the App could not resume from a previously written checkpoint. Again, the output listed in stderr out of the result should give a hint why. Most of the errors we get of this type are apparently due to a broken harddisk sector or even filesystems (e.g. some have the checkpoint file point to what looks like a portion of the client_state.xml). Again there's one error of this type we are trying to understand better in order to do something about it: It's an empty checkpoint file, in which case there will be an "EOF encountered" listed at the bottom of stderr out.
Exit code 99: This means that the App terminated because an internal check failed. Again there should be something at the end of stderr out that allows to further diagnose the problem. If stderr out lists "file SFTfileIO.c" at the bottom, the check that failed was a sanity check of the data read from the input files. Resetting the project and thus downloading a fresh set of data files might help. Again, there is one type of error we are working on to better understand what's happening in order to prevent this from happening again: In these cases the following lines are shown at the bottom of stderr out:
[CRITICAL]: Required frequency-bins [-8, 8] not covered by SFT-interval [...]
XLAL Error - LocalXLALComputeFaFb (LocalComputeFstat.c:536): Input domain error
Exit code -1073741502 (0xC0000142): It means that a DLL failed to load properly. This error looks like it's happening more frequently on Windows Vista, but we also get it from machines running Windows XP. We would greatly appreciate any idea which DLL this might be - so far I haven't got a clue (I could try to delay-load a specific DLL, but again for this I would need a "suspect").
Exit codes -1,0,1: These look like a program other than the BOINC Client (such as a malware scanner) terminated the App in the middle of crunching. The stderr out doesn't show anything helpful in these cases. Again, I (and probably a lot of participants) would be thankful for a hint why these are happening.
A word on BOINC Client versions: The "Windows Runtime Debugger" is only available with newer Clients (5.6 & up I think). The newest BOINC Client 5.10 has the bad habit of reporting only the "head" of the stderr output, which means that in many cases the useful diagnostic output is cut off. Earlier Clients (such as the 5.8 series) reported the last lines of stderr out, which made sure that the useful information that is at the bottom of the output doesn't get lost. So if you want to help us to track and fix the problems, I recommend using a 5.8 BOINC Core Client.
BM
BM

Client Errors of S5R2/S5R3 Apps
)
This definitely is a disk corruption, even of the file the stderr output is kept in.
BM
BM
RE: I recently was sent 6
)
Though they all failed with exit status 99, at least two of the four tasks failed with completely different symptoms: one with an error in reading the data files (though your client should check their integrity (md5sum) before starting the App), and the other in what looks like a programming error (NULL pointer), but apparently nobody else has stumbled over yet. I couldn't get anything useful of the stderr output of the other two, as the actual message has been truncated.
To me I'd guess your memory has gone faulty right at the moment were the first crash happened. I'd suggest to run a memory checker.
BM
BM
RE: Could you make an
)
Looking at the latest error rates from recent Apps (4.15) I doubt even more than before that this is actually a bug in the program, but rather a problem of a few machines. Some of them might be overclocked, some may experience some transient heat or other problem - I don't know, but the overall rate for this error has fallen below a percent.
There is something left in the Linux Apps that causes segfaults; I'd mostly expect it to be a bug left in the current BOINC library.
The DLL load problem of Windows still affects us noticeable. Our 4.15 App narrowed it down to probably be KERNEL32.DLL, and we're currently investigating the reasons named in a Microsoft Knowledge Base article, but there's not much I can further do in the App code to track this down.
Judging from individual computing errors the most "unreliable" platform is MacOS PPC, apparently due to an occasional(!) "invalid instruction" (signal 4). This looks like a problem rather in the build process than in the app code, but it needs to be fixed anyway.
BM
BM
RE: We are not alone, other
)
1. I'm not sure that change is made in the current 5.10 Core Client branch, or in the 6.x one which is also already under development.
2. This may fix the problem for Windows Vista shutdown. However we see the problem with all versions of Windows, roughly with the same distribution we have in the hosts database, e.g. most frequently on XPSP2. Its main cause might likely be a full desktop heap, and as Rom's note correctly states "Only a reboot can fix the desktop heap." This shouldn't be done by the client (automatically).
BM
BM
RE: So, what does the core
)
Every (non-console) Windows App needs a fragment of "Desktop Heap", or the KERNEL32.DLL and USER32.DLL will fail to load. In particular this applies to the BOINC Apps started by the client. If the Desktop heap is full, the App can't be started, which results in a Client Error that indicates a DLL load problem.
BM
BM
Thanks for the report. It
)
Thanks for the report.
It actually means that no signal handler (even in the 4.14 App) is catching this, which is a bad sign.
This isn't even a 64Bit machine right?
BM
BM
RE: This probably needs to
)
It has nothing to do with the BOINC Core Client. I found 1170 of these errors from 5.4.x Clients in the DB, and 1841 from 5.8.x Clients. Believe me?
BM
BM
What got moved in here was a
)
What got moved in here was a discussion that started over in the MacOS Beta App thread, but didn't really belong there.
BM
BM
First, all the tasks you
)
First, all the tasks you mentioned and that have been reported by your host so far were run with the standard App version 4.02.
I'm not completely sure that this covers all cases, but every time I have seen this error ("Input domain error") occurring myself was a machine with some (possibly transient) hardware problems.
Is the machine overclocked? Are you monitoring the CPU temperature? Is the machine standing somewhere where it gets hot (or so dusty that the CPU fan may suffer) occasionally?
BM
BM
RE: Also, E@H seemd to
)
I don't precisely know what kind of crash that was until the tasks have been reported, but a bad network driver could mess up the FPU stack and might be an explanation e.g. for "Input domain errors".
BM
BM