climateprediction.net home page
Error right at the end of big hadcm3n-task

Error right at the end of big hadcm3n-task

Message boards : Number crunching : Error right at the end of big hadcm3n-task
Message board moderation

To post messages, you must log in.

AuthorMessage
Waldmeister

Send message
Joined: 13 Jun 11
Posts: 34
Credit: 1,415,036
RAC: 1,383
Message 43938 - Posted: 13 Mar 2012, 16:03:08 UTC

Error right at the end of big hadcm3n-task.


May you have a look at task resultid=13654142, wuid=7714519. This is one of those long hadcm3n-tasks.

Something really annoying and frightening happend:
The task resulted in an error right at or near the end of it, at just above 99,7%.

The task seem to be running well, in recent weeks I put up trickle after trickle,
graphics showed no blue-globe and everything seem to work well but now right at the end
it resulted in an error.

Using a backup (see below) I even ran the task twice from the 99,5% mark on, both times producing same error.
I am also a little confused, since another of those tasks that I ran lately (resultid=13654159) finished successfully.


The situation before the error:

a) I last sent trickles two days ago (up to trickle #36) at the 90% mark.
I have a backup of the BoincData of that mark (90%).
b) I also have a extra backup at 99,5%.
c) Everything seemed to go well, see above.
d) In boinc manager at that time there were some finished and some intentionally
unstarted tasks of the "constellation"-project, so no other task running at the time.
The CPDN-task was the only task running.

Now at the error mark situation as follows:

e) all four remaining trickles since the 90%-mark are there. (!)
f) BOINC-slot emptied out
g) Big data directory with Data-in and Data-out-directories not there anymore.
h) No big zip file there, in contrast to trickle-files. (!)
i) Noteworthy contents of file client_state.xml:

i1) "fileinfo"-section with latest (fourth) zip file generates "status -161"

<file_info>
<name>hadcm3n_yfy3_1900_40_007517044_3_4.zip</name>
<nbytes>0.000000</nbytes>
<max_nbytes>188743680.000000</max_nbytes>
<generated_locally/>
<status>-161</status>
<upload_when_present/>
<url>http://rapid-watch.badc.rl.ac.uk/cpdn_cgi/file_upload_handler</url>
</file_info>

i2) "result"-section with "exitstate 3", "exitstatus/code 193" and "Signal 11 received"

<result>
<name>hadcm3n_yfy3_1900_40_007517044_3</name>
<final_cpu_time>1605401.000000</final_cpu_time>
<final_elapsed_time>1653172.960177</final_elapsed_time>
<exit_status>193</exit_status>
<state>3</state>
<platform>windows_intelx86</platform>
<version_num>607</version_num>
<stderr_out>
<![CDATA[
<message>
- exit code 193 (0xc1)
</message>
<stderr_txt>
. Suspended CPDN Monitor - Suspend request from BOINC...
...
(numerous copies of that message)
...
Suspended CPDN Monitor - Suspend request from BOINC...
Signal 11 received, exiting... Called boinc_finish
</stderr_txt>
]]>
</stderr_out>
<ready_to_report/>
<completed_time>1331626544.096983</completed_time>
<wu_name>hadcm3n_yfy3_1900_40_007517044</wu_name>
<report_deadline>1329875843.000000</report_deadline>
<received_time>1321986572.179943</received_time>
<file_ref>
<file_name>hadcm3n_yfy3_1900_40_007517044_3_1.zip</file_name>
<open_name>cpdnout1.zip</open_name>
</file_ref>
<file_ref>
<file_name>hadcm3n_yfy3_1900_40_007517044_3_2.zip</file_name>
<open_name>cpdnout2.zip</open_name>
</file_ref>
<file_ref>
<file_name>hadcm3n_yfy3_1900_40_007517044_3_3.zip</file_name>
<open_name>cpdnout3.zip</open_name>
</file_ref>
<file_ref>
<file_name>hadcm3n_yfy3_1900_40_007517044_3_4.zip</file_name>
<open_name>cpdnout4.zip</open_name>
</file_ref>
</result>

i3) "active task set"-section empty

<active_task_set>
</active_task_set>




Now, can somebody please check my information above and give me a hint what changes to do in order to finish this task properly, using some of my backups? It would be very sad to loose this task at the very end. Thanks in advance!
ID: 43938 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 43943 - Posted: 13 Mar 2012, 21:36:34 UTC - in response to Message 43938.  

Sorry about that but it does happen. There's several posts about a similar occurrence, both on this board, and on our php board.
I've had some myself.

Some of the models seem to rather unstable at the various 25% points of the model. Happening right at the end can be upsetting, but there's not a lot that can be done.
Best advice is to unselect these long models from your prefs, and just run the regional models.


Backups: Here
ID: 43943 · Report as offensive     Reply Quote
Waldmeister

Send message
Joined: 13 Jun 11
Posts: 34
Credit: 1,415,036
RAC: 1,383
Message 43946 - Posted: 13 Mar 2012, 22:11:27 UTC - in response to Message 43943.  

Sorry about that but it does happen. There's several posts about a similar occurrence, both on this board, and on our php board.


Yeah, read some posts, but wasn't sure if it would be similar to my problem.

Some of the models seem to rather unstable at the various 25% points of the model. Happening right at the end can be upsetting, but there's not a lot that can be done.


The real upsetting thing is, a I understand it, that these calculations are a precondition for further calculation in further (virtual) years.
Now, if all 5 tasks of this wu are corrupt and no success task available this modell is dying. By the way, as I got the feeling, it was a rather warm task. Lots of deep red spots.

The oddity is, and you may correct me, that if i upload those last four trickles, I may get full points on a corrupt task. Weird. Lol.

Best advice is to unselect these long models from your prefs, and just run the regional models.


I wanted to do that anyway after this task. One successful task of that sort already satisifies me. The only 'must do' thing left for me (for the moment) is to run one of those rather rare regional-SAF-models. Everything after that is optional.

All in all, maybe I find a way to get this task run out normally, must do some thinking over the next week or so.

Greetings.
ID: 43946 · Report as offensive     Reply Quote

Message boards : Number crunching : Error right at the end of big hadcm3n-task

©2024 cpdn.org