Looks like I was at least partly wrong in my interpretation of the results; the client isn't catching the signal. Anyway, using an older client shouldn't harm and might give some interesting insights anyway.
most of your tasks end up with exit code -185 and the message "Can't get shared memory segment name: can't get shared mem segment name". This is actually a message from the client and means that it couldn't set up a shared memory segment to communicate with the App. I don't know how to fix this, though.
When you fixed this, would you be willing to help us debug the "signal 11" problem in the App? The method of starting the DDD is the same I mentioned in my second post here, you would just need to type "cont" (or push the command toolbar button) once a new App is started and if it catches a signal 11, report the output (stackdump) here.
I don't think it's related to the Client or the App. Have you started to use some new program recently that may make heavy use of shared memory resources? Have you upgraded (part of) your system recently that could affect shart memory management?
1. with "upgrade" I was rather thinking of updating software packages than installing new hardware
2. the "shared memory" e.g. the error messages in stderr_out refer to is not a memory physically shared between hardware devices (such as the CPU and your graphics adapter), but a piece of memory shared between processes, so rather a software than a hardware thing.
3. Your machine reports a number of very different errors:
- exit status -185: This means that the client couldn't start the App at all. The reason for this is given in stderr_out, in your case it's usually referring to shared memory.
- exit status 139: This is the "signal 11" error
- exit status 134: "process got signal 6" (SIGABRT / Abort). I never noticed that before. Could this be from shutting down the system?
4. Putting the file "EAH_DEBUG_DDD" into the BOINC directory should start the debugger automatically at the very beginning of the task and tells it to attach to the process automatically. If successful it will interrupt the task on its own, there's no need to manually attach a debugger to the running process. In contrast, you'll have to type "cont" at the gdb prompt or press the "Cont" button on the command toolbar to get the task going again. (BTW In case of a -185 error, the App isn't started at all, and so you don't see anything of a debugger)
I still don't know why our signal handler doesn't catch the signal on certain machines / systems; currently the only way to get some information about the cause of the signals is running the App with a debugger attached.
Oops.. this box of mine had a whole serie of signal 11's, but that chip had a bad overclock so I think you can discard them.
Not sure. If it's alway the same location it might still be useful to know; there might be a program bug as well. Do you want to give it a try with ddd?
I just found and fixed the (hopefully only) reason of the "signal 11" problem that has been bugging Einstein@home on Linux for so long. I'll build another App and publish this for Beta Test soon.
Quote:
@Bernd--How much trouble would it be to recompile this for a SPARC version of Linux? I've just set up a Sun Ultra 5 with the SPARC version of Debian, and I'm curious about how it would handle some Einstein action. ;)
I don't think the number of machines is worth the effort. I tried to compile E@H for Linux/PPC a while ago and found that some parts of the code apparently assume that Linux = Linux/i386. It would involve changes to the code I don't have time for right now.
I just found and fixed the (hopefully only) reason of the "signal 11" problem that has been bugging Einstein@home on Linux for so long. I'll build another App and publish this for Beta Test soon.
Looks like I was at least
)
Looks like I was at least partly wrong in my interpretation of the results; the client isn't catching the signal. Anyway, using an older client shouldn't harm and might give some interesting insights anyway.
BM
BM
Annika, most of your tasks
)
Annika,
most of your tasks end up with exit code -185 and the message "Can't get shared memory segment name: can't get shared mem segment name". This is actually a message from the client and means that it couldn't set up a shared memory segment to communicate with the App. I don't know how to fix this, though.
When you fixed this, would you be willing to help us debug the "signal 11" problem in the App? The method of starting the DDD is the same I mentioned in my second post here, you would just need to type "cont" (or push the command toolbar button) once a new App is started and if it catches a signal 11, report the output (stackdump) here.
BM
BM
I don't think it's related to
)
I don't think it's related to the Client or the App. Have you started to use some new program recently that may make heavy use of shared memory resources? Have you upgraded (part of) your system recently that could affect shart memory management?
BM
BM
Annika, 1. with "upgrade" I
)
Annika,
1. with "upgrade" I was rather thinking of updating software packages than installing new hardware
2. the "shared memory" e.g. the error messages in stderr_out refer to is not a memory physically shared between hardware devices (such as the CPU and your graphics adapter), but a piece of memory shared between processes, so rather a software than a hardware thing.
3. Your machine reports a number of very different errors:
- exit status -185: This means that the client couldn't start the App at all. The reason for this is given in stderr_out, in your case it's usually referring to shared memory.
- exit status 139: This is the "signal 11" error
- exit status 134: "process got signal 6" (SIGABRT / Abort). I never noticed that before. Could this be from shutting down the system?
4. Putting the file "EAH_DEBUG_DDD" into the BOINC directory should start the debugger automatically at the very beginning of the task and tells it to attach to the process automatically. If successful it will interrupt the task on its own, there's no need to manually attach a debugger to the running process. In contrast, you'll have to type "cont" at the gdb prompt or press the "Cont" button on the command toolbar to get the task going again. (BTW In case of a -185 error, the App isn't started at all, and so you don't see anything of a debugger)
I still don't know why our signal handler doesn't catch the signal on certain machines / systems; currently the only way to get some information about the cause of the signals is running the App with a debugger attached.
BM
BM
RE: Oops.. this box of mine
)
Not sure. If it's alway the same location it might still be useful to know; there might be a program bug as well. Do you want to give it a try with ddd?
BM
BM
I just found and fixed the
)
I just found and fixed the (hopefully only) reason of the "signal 11" problem that has been bugging Einstein@home on Linux for so long. I'll build another App and publish this for Beta Test soon.
I don't think the number of machines is worth the effort. I tried to compile E@H for Linux/PPC a while ago and found that some parts of the code apparently assume that Linux = Linux/i386. It would involve changes to the code I don't have time for right now.
BM
BM
RE: I just found and fixed
)
It's there.
BM
BM