Questions and Answers :
Windows :
Can\'t get past 2075
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Feb 06 Posts: 2 Credit: 42,756 RAC: 0 |
My model has been running fine... 96% complete. And now it seems to keep exiting and restarting when it trickles up the 2075 results. Darn it. Looking at the messages, the sequence of events seems to be as follows: 1. Suspends work-fetch because com puter is overcommitted.<<it does this when I switch it on in the morning too>> 2. Result hadcm3_blah exited with zeror status but no finished file 3. Request reschedule cpus: process exited 4. Restart result hadcm3_blah 5. Send trickle-up. 6. Result hadcm3_blah exited with zeror status but no finished file 7. Request reschedule cpus: process exited 8. Restart result hadcm3_blah 9. Send trickle-up. 10. Scheduler request succeeded. 11. Send trickle-up 12.Scheduler request succeeded. And whammo! I\'m back to December 2074. It\'s done this regularly now for quite some time. Anythin I can do to fix. And no, I don\'t keep back ups... |
Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370 |
Your model may be rewinding if it gets an invalid performance measurement, such as negative atmospheric pressure. It will only rewind so many times before it decides to give up. Having a model 96% complete will be useful to the scientists, so not to worry if it doesn\'t make it to 100%. |
Send message Joined: 7 Aug 04 Posts: 2183 Credit: 64,822,615 RAC: 5,275 |
It looks like that model was downloaded last year and the executable had a problem with not terminating the run after three rewinds. If it encounters a computation problem, it is supposed to rewind a day, then a month, then a year, then abort if it can\'t get past the problem spot. The early version of the model would not abort after the year rewind failed. I would manually abort that task if I were you, otherwise it might wind up looping forever. As DJ said, your model will be useful to the researchers. |
Send message Joined: 28 Feb 06 Posts: 2 Credit: 42,756 RAC: 0 |
Thanks both! I will abort. Disappointing not to get all the way there, but if it\'s a problem with the model, rather than my computer, that\'s not so bad. Finding models that don\'t work is all part of it, after all! |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Tom A few members have managed to get their similarly looping model through the sticking point that causes the repeated looping by moving/restoring a backup of the model (made at a model date before the sticking point, obviously) from an AMD computer to an Intel one, or vice-versa. This can work because of the slightly different ways that Intel and AMD handle the calculations. We have several reports of this method working and the models then completing successfully. We have so far had no reports of the method being tried but failing. Even if you\'ve already aborted the model you could try this if a)you have a fairly recent backup and b)you have access to a computer with the other type of CPU. Whether you can attempt this or not, you\'ll be glad to know that although it\'s best to get our models to 2080 if we possibly can, all models that reach 2050 or beyond are added to the cpdn website front page statistics. Cpdn news |
©2024 cpdn.org