climateprediction.net home page
What\'s that mean \"exceeded CPU time limit 12147899.159664\"

What\'s that mean \"exceeded CPU time limit 12147899.159664\"

Questions and Answers : Windows : What\'s that mean \"exceeded CPU time limit 12147899.159664\"
Message board moderation

To post messages, you must log in.

AuthorMessage
alexpon

Send message
Joined: 2 Feb 06
Posts: 1
Credit: 203,681
RAC: 0
Message 28507 - Posted: 7 May 2007, 13:20:47 UTC

2007-5-7 16:07:35|climateprediction.net|Aborting task hadcm3lbm_btu5_05321152_1: exceeded CPU time limit 12147899.159664
2007-5-7 16:07:35|climateprediction.net|Deferring communication 1 min 0 sec, because Unrecoverable error for result hadcm3lbm_btu5_05321152_1 (Maximum CPU time exceeded)
2007-5-7 16:07:40|climateprediction.net|Computation for task hadcm3lbm_btu5_05321152_1 finished
ID: 28507 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 28512 - Posted: 7 May 2007, 14:36:03 UTC
Last modified: 7 May 2007, 14:49:17 UTC

Hi Alexpon

A day or two ago I wrote a post about this error for inclusion in the cpdn READMEs. It isn\'t in the READMEs yet, but here\'s the draft. If you have a backup you can use this method to save your model. If you haven\'t got a backup, it\'s impossible and you\'ll need to get a new model.
--------------------------------------------------------------------------

When boinc runs its benchmarks, a work unit (climate model) has an estimated number of Floating Point Operations (fpops) assigned to it. Boinc assigns a maximum number (bound) of fpops that the model will be allowed to use. This is similar to imposing a time limit depending not on calendar dates but on how long the model is allowed to actively run on a particular computer. On a slow computer a model will be allowed to run for longer, but on a fast computer the limit (bound) is lower. Under normal conditions a model will never hit the limit (fpops bound). The limit is imposed to ensure that a looping model forgotten by its owner cannot run indefinitely. A model that hits the bound/limit will crash with a message like \'CPU time exceeded\'.

When a model is transferred from a slow to a much faster computer, or a computer\'s CPU is changed for a faster one, boinc runs new benchmarks. It recalculates the fpops bound as if the model had spent its whole life on the faster computer, and does not take the move into account. The model may therefore hit its new reduced limit before it completes. The longer the model crunched on the slow computer and the more it has been speeded up, the more likely this problem becomes. A crash with this error is more likely if the model runs at least twice as fast on the new computer and the model is more than half completed when transferred. In the case of a model already ¾ completed and speeded up 3x or 4x, this error is very probable. But the fpops limit imposed includes a safety margin which allows many transferred models to complete without problems.

PRECAUTIONS

Before moving the contents of the boinc folder to the new computer, the complete contents of this folder should be backed up. If you forgot to do this, make a backup as soon after the move as possible. Here is a selection of backup and restore methods.

FIX

The fpops bound value (number) has to be changed in a boinc file. One can do this as soon as boinc has run its benchmarks on the faster computer, or run the model in the hope that the problem will not occur. If one chooses the second option, it is essential to make regular backups. If the model does crash with this error, the fpops value can be changed immediately after restoring a backup and letting boinc run benchmarks again.

This fix is not intended for workunits from other projects that are too short to be worth backing up and restoring if they crash. If this error occurs repeatedly with short workunits, it should be reported on that project\'s forum. The project\'s programmers can then correct the code for this faulty batch of workunits.

HOW TO FIX IT (surprisingly easy!)

*If your model has already crashed with this error, restore your backup.

*Start boinc.

*Open boinc manager and look at the Messages tab to check that boinc has run its new benchmarks.

*Write down the long name and number of the model. It will begin with something like \'hadsm3ln......\'.

*Suspend the model if it is running, using the boinc manager Activity menu.

*Exit from boinc.

*In Windows Explorer, go to C\\Program files\\BOINC (or Climate Change Experiment).

*Double-click on BOINC.

*Find the client_state.xml file. Right-click on it.

*Select Edit.

*The Client_state file will open up in Notepad, which allows it to be edited.

*Find the <Workunit> block containing the name of the model.

*Double the value (number) for <rsc_fpops_bound>.

*Click Save.

*Close Notepad.

*Restart boinc and the model.

Please note that it is not necessary to edit the client_state_previous.xml file.


(Thanks also to Thyme Lawn)
Cpdn news
ID: 28512 · Report as offensive     Reply Quote

Questions and Answers : Windows : What\'s that mean \"exceeded CPU time limit 12147899.159664\"

©2024 climateprediction.net