We get some errors with exit code 0x40010004 from hosts running Windows Vista. Did anyone run this App successfully and reliably on Vista, or is it failing on all such machines? Any clues what precisely might be the reason of this error?
Sorry I don't know the exact version myself. it needs to be some version that installs the dbghelp.dll in the BOINC directory. 5.8.x is definitely ok, 5.2 sounds too old, should be something in between where the change occured. Anyone knows the version for sure?
You get a proper trace of an internal error from BOINC_LAL_ErrHand(), which "now calling boinc_finish()", but apparently it's boinc_finish() that failes (which does little more than just exit()) with an access violation. Something has gone really, really wrong on this machine (faulty memory or similar).
When you get to the point of deploying the new validator and the new set of apps, are you intending to run a (perhaps short) beta test phase first, as you did with the 4.24 Windows app?
If new Apps are needed, I'll definitely publish them for a public Beta test first.
Currently it looks like upgrading some server-side components (validator and workunit generator) may solve the problem and be the best choice, but we're still looking into this.
Quote:
If you are, might I make a suggestion about the app_info.xml file that would accompany each test app? As you warn quite clearly on the beta test page, changing the app aborts any work in progress with a client error. However you can easily avoid this with a small modification to the app_info.xml file. If you are already fully aware of this and do not want to allow a change of app in the middle of a result, that is fine - no change is needed.
My thinking is that the beta test period could be kept shorter and the number of potential beta testers could be increased if people were allowed to "re-brand" the results in their caches so that they didn't have to abort or wait for their caches to drain or in any way disrupt their normal crunching patterns in order to participate in the test. I'm sure that people have done this in the past by editing their state files. I think it's much safer to do it through the app_info.xml mechanism.
Actually I'll not advise people to manually hack the client_state.xml files, they are too fragile.
However in the future the app_info.xml files in the Beta Test packages will include entries for previous (maybe both official and beta) App versions, so after installing the Beta Test Package even in the middle of a result will not lead to a Client Error, but just to be finished with the old App version, and new work will be assigned to the new App.
Furthermore if you really want to switch the App version halfway through a result, see the sticky post on this subject. I can not guarantee that it will work at all, as e.g. the syntax of the checkpoint file might change between versions.
Taking the case of the transition from 4.17 to 4.24 as an example. Here there were desirable bugfixes and apparently no change in output syntax. It would be prudent therefore for any 4.17 "branded" results in a person's cache to be crunched by 4.24, rather than the old buggy app. This can be achieved very simply using a bit more intelligence built into app_info.xml. No dodgy editing of the state file is required at all.
Currently it looks like upgrading some server-side components (validator and workunit generator) may solve the problem and be the best choice, but we're still looking into this.
Wouldn't it be worthwhile to correct the uninitialized data problem in the Linux and Mac apps? As those were detected by compiler runtime checks, to me it sounds as if they were relevant.
On Linux and Mac we haven't seen a single result that have been affected by this bug, i.e. it didn't have an effect on the final outcome of the calculation. With this 4.24 Windows App we have found another problem in the same module (which might have been introduced by the fix to the earlier problem). We're working on this. So we'll definitely release a new generation of Apps anyway with some bugfixes.
However for the cross-platform validation problem (only) it might be that we'll need to deal with this only on the server side.
How about the 0xc0000142 crash issues? I don't know if you got my email, as you haven't replied... I wish I knew more of what to help with, but that error is a vexing one...
Yep, got it. Sorry for not replying immediately, had two rather chaotic days. Wrote to Rom about it as you suggested.
Quote:
Edit: BTW, SIGABRT still seems to come up for Linux. See this result.
Yep. But not too many (190 in past week), most from the same 4 machines. Not my highest priority right now.
Very strange: It restarts, finds the checkpoint-file (!), tries to open it but somehow can't (!), and exists with an error message that the checkpoint file isn't there at all ...
Yep. Keeps me confused ever since I made the error messages a little more verbose. We actually get a lot of these errors, I'll write to Rom about that. Maybe boinc_fopen() does some funny things...
Just to keep you updated of our plans, mainly regarding the cross-platform differences:
- Early next week (probably Monday) we'll issue a new validator that should make things easier for transition and probably fix some invalid results by itself
- After the new validator is in place, we'll issue a new set of Apps for public Beta Test (for all platforms) that incorporate the fixes accomplished so far. I'll keep on tracking problems and fixing bugs I find until the very last moment. The new Apps will also incorporate a new feature that we might need.
- If it turns out that we need this feature (using pre-calculated files instead of doing the calculations in the Apps to avoid platform differences there), we will issue new workunits (actually a new workunit generator) that will make use of this feature after the new Apps have been made "official".
- Once we got the validation working properly, I'll work on speeding up the computation in the Apps. The current code I plan to use for parts of the calculation btw. doesn't make use of neither modf() nor ftol() anymore but actually uses bit-operations to achieve something similar.
Now that the validator issue has been resolved, are we almost to the point of beta testing a new batch of apps?
Yes we are. I'm currently waiting for some internal tests to finish and some feedback from other developers from the other side of the earth (see http://www.amaldi7.com/). Apps are in the pipeline.
We get some errors with exit
)
We get some errors with exit code 0x40010004 from hosts running Windows Vista. Did anyone run this App successfully and reliably on Vista, or is it failing on all such machines? Any clues what precisely might be the reason of this error?
BM
BM
Sorry I don't know the exact
)
Sorry I don't know the exact version myself. it needs to be some version that installs the dbghelp.dll in the BOINC directory. 5.8.x is definitely ok, 5.2 sounds too old, should be something in between where the change occured. Anyone knows the version for sure?
BM
BM
RE: I just saw this one:
)
You get a proper trace of an internal error from BOINC_LAL_ErrHand(), which "now calling boinc_finish()", but apparently it's boinc_finish() that failes (which does little more than just exit()) with an access violation. Something has gone really, really wrong on this machine (faulty memory or similar).
BM
BM
RE: When you get to the
)
If new Apps are needed, I'll definitely publish them for a public Beta test first.
Currently it looks like upgrading some server-side components (validator and workunit generator) may solve the problem and be the best choice, but we're still looking into this.
Actually I'll not advise people to manually hack the client_state.xml files, they are too fragile.
However in the future the app_info.xml files in the Beta Test packages will include entries for previous (maybe both official and beta) App versions, so after installing the Beta Test Package even in the middle of a result will not lead to a Client Error, but just to be finished with the old App version, and new work will be assigned to the new App.
Furthermore if you really want to switch the App version halfway through a result, see the sticky post on this subject. I can not guarantee that it will work at all, as e.g. the syntax of the checkpoint file might change between versions.
BM
BM
RE: Taking the case of the
)
I understand.
I guess I have to think about this a little more.
BM
BM
RE: RE: Currently it
)
On Linux and Mac we haven't seen a single result that have been affected by this bug, i.e. it didn't have an effect on the final outcome of the calculation. With this 4.24 Windows App we have found another problem in the same module (which might have been introduced by the fix to the earlier problem). We're working on this. So we'll definitely release a new generation of Apps anyway with some bugfixes.
However for the cross-platform validation problem (only) it might be that we'll need to deal with this only on the server side.
BM
BM
RE: How about the
)
Yep, got it. Sorry for not replying immediately, had two rather chaotic days. Wrote to Rom about it as you suggested.
Yep. But not too many (190 in past week), most from the same 4 machines. Not my highest priority right now.
BM
BM
RE: Very strange: It
)
Yep. Keeps me confused ever since I made the error messages a little more verbose. We actually get a lot of these errors, I'll write to Rom about that. Maybe boinc_fopen() does some funny things...
BM
BM
Just to keep you updated of
)
Just to keep you updated of our plans, mainly regarding the cross-platform differences:
- Early next week (probably Monday) we'll issue a new validator that should make things easier for transition and probably fix some invalid results by itself
- After the new validator is in place, we'll issue a new set of Apps for public Beta Test (for all platforms) that incorporate the fixes accomplished so far. I'll keep on tracking problems and fixing bugs I find until the very last moment. The new Apps will also incorporate a new feature that we might need.
- If it turns out that we need this feature (using pre-calculated files instead of doing the calculations in the Apps to avoid platform differences there), we will issue new workunits (actually a new workunit generator) that will make use of this feature after the new Apps have been made "official".
- Once we got the validation working properly, I'll work on speeding up the computation in the Apps. The current code I plan to use for parts of the calculation btw. doesn't make use of neither modf() nor ftol() anymore but actually uses bit-operations to achieve something similar.
BM
BM
RE: Now that the validator
)
Yes we are. I'm currently waiting for some internal tests to finish and some feedback from other developers from the other side of the earth (see http://www.amaldi7.com/). Apps are in the pipeline.
BM
BM