how often does einstein checkpoint workunits?

Anonymous

21 Feb 2005 15:47:44 UTC

Topic 12629

(moderation:

)

The App writes two temp files during two stages of the analysis (0-49.5%,49.5-99%). It writes checkpoints as you specified in the preferences. However it checks the temp files when it is resumed. I think it happened that during the crash the second file got corrupted, so the App decided to repeat the calculation.

Bernd Machenschalk

Joined: 15 Oct 04

Posts: 2,684

Credit: 25,950,161

RAC: 34,820

how often does einstein checkpoint workunits?

21 Feb 2005 20:06:08 UTC

Message 2275

(moderation:

)

> yes that would explain it. perhaps the temp files should be written more
> often, especially for those of us with slower processors.

They are written more or less continously (though not more often than the checkpoints). The "problem", if any, is that they are not closed until they have been fully written. This isn't a problem as long as the processes on that machine are properly shut down. When, however, the machine crashes severely or someone pulls the plug and the OS hasn't time to properly terminate the running processes, the file _might_ get damaged, just like other files of other running applications.

Bruce Allen

Joined: 15 Oct 04

Posts: 958

Credit: 170,849,008

RAC: 0

> > > yes that would explain

21 Feb 2005 21:04:15 UTC

Message 2276

(moderation:

)

> > > yes that would explain it. perhaps the temp files should be written
> more
> > > often, especially for those of us with slower processors.
> >
> > They are written more or less continously (though not more often than
> the
> > checkpoints). The "problem", if any, is that they are not closed until
> they
> > have been fully written. This isn't a problem as long as the processes on
> that
> > machine are properly shut down. When, however, the machine crashes
> severely or
> > someone pulls the plug and the OS hasn't time to properly terminate the
> > running processes, the file _might_ get damaged, just like other files
> of
> > other running applications.
> >
> > BM
>
> I looked for source code, but couldn't find a link to it. Is it available?
> Would help in looking into some of these problems.
>
> Does the program flush the buffered data to disk? It won't help anything if
> the PC crashes while the temporary files are being written, but it sure does
> help if the PC crashes later. Like:
>
> fflush( *outstream );

Walt, the code internally allocates a 2MB buffer and writes to the buffer. Checkpointing consists of flushing that buffer to disk. The frequency can be set by the user in their preferences. The E@h default is 60 secs. Users attached to other projects should beware that they inherit whatever checkpoint default was used for *those* projects.

Bruce

Bernd Machenschalk

Joined: 15 Oct 04

Posts: 2,684

Credit: 25,950,161

RAC: 34,820

1. NTFS5 should be

21 Feb 2005 22:57:10 UTC

Message 2277

(moderation:

)

1. NTFS5 should be journaling, isn't it?

2. The checkpoint files are separate from the client_state files. The are also written according to your settings, too, so once a minute per default.

3. The temp files can grow large - several MBs. I think for most users it's not a good idea to keep and write two copies of them. Most would prefer to deal with BOINC like with any other program - prevent the machine from crashing, as it may trash work. At least with BOINC it's not you own work you lose, just a bit of CPU time.

Bernd Machenschalk

Joined: 15 Oct 04

Posts: 2,684

Credit: 25,950,161

RAC: 34,820

Quite a lot of people

22 Feb 2005 11:11:20 UTC

Message 2278

(moderation:

)

Quite a lot of people (literally thousands by now) are running E@H without problems, so it's unlikely that BOINC/E@H causes the crashes by itself. It may, however, trigger other problems you have on your system and remained unnoticed before. Frequent issues include the graphics driver (E@H makes much more use of OpenGL than most other programs) and, as it is mainly CPU-bound, overheating / cooling problems.

Bernd Machenschalk

Joined: 15 Oct 04

Posts: 2,684

Credit: 25,950,161

RAC: 34,820

> Does einstein save all the

22 Feb 2005 17:21:47 UTC

Message 2279

(moderation:

)

> Does einstein save all the checkpoint files when a normal exit is performed?

Sure. That's what checkpointing is for.

how often does einstein checkpoint workunits?

Forums › Cafe Einstein

how often does einstein checkpoint workunits?

> > > yes that would explain

1. NTFS5 should be

Quite a lot of people

> Does einstein save all the

Comment viewing options

Forums › Cafe Einstein