climateprediction.net home page
Can\'t get past 2075

Can\'t get past 2075

Questions and Answers : Windows : Can\'t get past 2075
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user169642

Send message
Joined: 28 Feb 06
Posts: 2
Credit: 42,756
RAC: 0
Message 29736 - Posted: 27 Jul 2007, 15:20:45 UTC

My model has been running fine... 96% complete. And now it seems to keep exiting and restarting when it trickles up the 2075 results. Darn it.

Looking at the messages, the sequence of events seems to be as follows:

1. Suspends work-fetch because com puter is overcommitted.<<it does this when I switch it on in the morning too>>
2. Result hadcm3_blah exited with zeror status but no finished file
3. Request reschedule cpus: process exited
4. Restart result hadcm3_blah
5. Send trickle-up.
6. Result hadcm3_blah exited with zeror status but no finished file
7. Request reschedule cpus: process exited
8. Restart result hadcm3_blah
9. Send trickle-up.
10. Scheduler request succeeded.
11. Send trickle-up
12.Scheduler request succeeded.

And whammo! I\'m back to December 2074. It\'s done this regularly now for quite some time. Anythin I can do to fix. And no, I don\'t keep back ups...




ID: 29736 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 29738 - Posted: 27 Jul 2007, 19:05:37 UTC - in response to Message 29736.  

Your model may be rewinding if it gets an invalid performance measurement, such as negative atmospheric pressure. It will only rewind so many times before it decides to give up. Having a model 96% complete will be useful to the scientists, so not to worry if it doesn\'t make it to 100%.
ID: 29738 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2183
Credit: 64,822,615
RAC: 5,275
Message 29743 - Posted: 27 Jul 2007, 23:27:57 UTC
Last modified: 27 Jul 2007, 23:45:47 UTC

It looks like that model was downloaded last year and the executable had a problem with not terminating the run after three rewinds. If it encounters a computation problem, it is supposed to rewind a day, then a month, then a year, then abort if it can\'t get past the problem spot. The early version of the model would not abort after the year rewind failed. I would manually abort that task if I were you, otherwise it might wind up looping forever.

As DJ said, your model will be useful to the researchers.
ID: 29743 · Report as offensive     Reply Quote
old_user169642

Send message
Joined: 28 Feb 06
Posts: 2
Credit: 42,756
RAC: 0
Message 29769 - Posted: 30 Jul 2007, 8:15:20 UTC - in response to Message 29743.  

Thanks both! I will abort. Disappointing not to get all the way there, but if it\'s a problem with the model, rather than my computer, that\'s not so bad. Finding models that don\'t work is all part of it, after all!
ID: 29769 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 29770 - Posted: 30 Jul 2007, 15:00:24 UTC
Last modified: 30 Jul 2007, 15:03:16 UTC

Hi Tom

A few members have managed to get their similarly looping model through the sticking point that causes the repeated looping by moving/restoring a backup of the model (made at a model date before the sticking point, obviously) from an AMD computer to an Intel one, or vice-versa.

This can work because of the slightly different ways that Intel and AMD handle the calculations.

We have several reports of this method working and the models then completing successfully. We have so far had no reports of the method being tried but failing.

Even if you\'ve already aborted the model you could try this if a)you have a fairly recent backup and b)you have access to a computer with the other type of CPU.

Whether you can attempt this or not, you\'ll be glad to know that although it\'s best to get our models to 2080 if we possibly can, all models that reach 2050 or beyond are added to the cpdn website front page statistics.
Cpdn news
ID: 29770 · Report as offensive     Reply Quote

Questions and Answers : Windows : Can\'t get past 2075

©2024 cpdn.org