climateprediction.net (CPDN) home page
Thread 'Automatic Rollback'

Thread 'Automatic Rollback'

Questions and Answers : Wish list : Automatic Rollback
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user103147

Send message
Joined: 21 Oct 05
Posts: 2
Credit: 157,082
RAC: 0
Message 21428 - Posted: 19 Mar 2006, 20:04:07 UTC

The model already does check-pointing, so there\'s no reason why it couldn\'t implement automatic rollback when the model crashes. I just have a 78% sulfur crash, losing months of work. If I lose any more work, I have no choice but to leave the project.
ID: 21428 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 21429 - Posted: 19 Mar 2006, 22:28:42 UTC
Last modified: 19 Mar 2006, 22:38:32 UTC

The Model already does roll-backs for internal errors. If it hits a brick wall, it rewinds a Model day and retries. If it still doesn\'t get through, it rewinds a Model month and, finally, a Model year before crashing.

External things are another matter entirely and can come from anything from hardware problems, to operator error, to program mix. For example, Norton anti-Virus scans can cause conflict if it happens to lock a file for scan just as CPDN wants that file. (For that reason we recommend that the CPDN folder be excluded from scans.) Can it be programmed? Perhaps. But it seems a complicated problem to trap & evaluate all the possibilities; meanwhile, Carl and Tolu have more than they can handle keeping all the Projects afloat.

What were the symptoms of your crash?

Edit: Regular backups of the entire CPDN folder are also recommended, to prevent that which you experienced. (Such a backup saved one of my Spinup Runs ~195 years into a 200 Model-year Run...)
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 21429 · Report as offensive     Reply Quote
old_user103147

Send message
Joined: 21 Oct 05
Posts: 2
Credit: 157,082
RAC: 0
Message 21515 - Posted: 23 Mar 2006, 1:30:32 UTC - in response to Message 21429.  

The Model already does roll-backs for internal errors. If it hits a brick wall, it rewinds a Model day and retries. If it still doesn\'t get through, it rewinds a Model month and, finally, a Model year before crashing.

External things are another matter entirely and can come from anything from hardware problems, to operator error, to program mix. For example, Norton anti-Virus scans can cause conflict if it happens to lock a file for scan just as CPDN wants that file. (For that reason we recommend that the CPDN folder be excluded from scans.) Can it be programmed? Perhaps. But it seems a complicated problem to trap & evaluate all the possibilities; meanwhile, Carl and Tolu have more than they can handle keeping all the Projects afloat.

What were the symptoms of your crash?

Edit: Regular backups of the entire CPDN folder are also recommended, to prevent that which you experienced. (Such a backup saved one of my Spinup Runs ~195 years into a 200 Model-year Run...)


Anti-Virus exclusion is already done. And if manual back-up is necessary, it should be done automatically.
ID: 21515 · Report as offensive     Reply Quote
old_user94880

Send message
Joined: 27 Aug 05
Posts: 156
Credit: 112,423
RAC: 0
Message 21516 - Posted: 23 Mar 2006, 2:03:37 UTC

Backup is not necessary, I never have and never will, I just keep my systems up to speed and no problems.....
BOINC Wiki
ID: 21516 · Report as offensive     Reply Quote
ProfileAndrew Hingston
Volunteer moderator

Send message
Joined: 17 Aug 04
Posts: 753
Credit: 9,804,700
RAC: 0
Message 21521 - Posted: 23 Mar 2006, 9:19:16 UTC - in response to Message 21515.  

if manual back-up is necessary, it should be done automatically.

There\'s a danger of talking at cross purposes here. Automatic rollback is already implemented (the original point), though in a post on the seasonal/attribution forum Tolu refers to rollback to the last, penultimate and then first checkpoint, and I think that may be the pattern for the current experiments. It seems to be the case with the BBC model, because there was a poster on those boards a tad cheesed off after he appeared to have gone back to his first checkpoint after a system crash (not BOINC related).

But rollback does not always work, and given the complexities of the model it\'s not surprising. Automating a full backup of the BOINC folder is problematic partly because it can\'t be done on the fly, but also because BOINC is multi project. It has a use in climateprediction, and disadvantages for everyone else.

Personally, I don\'t find exiting BOINC occasionally and doing a drag and drop copy a great chore. ;-)


ID: 21521 · Report as offensive     Reply Quote

Questions and Answers : Wish list : Automatic Rollback

©2024 cpdn.org