climateprediction.net home page
Task 15100800

Task 15100800

Name hadam3p_eu_8cw9_2001_1_008133927_0
Workunit 8289041
Created 12 Aug 2012, 5:29:36 UTC
Sent 12 Aug 2012, 5:46:12 UTC
Report deadline 25 Jul 2013, 11:06:12 UTC
Received 4 Sep 2012, 6:50:31 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 1160009
Run time 3 days 3 hours 8 min 30 sec
CPU time 2 days 23 hours 49 min 52 sec
Validate state Invalid
Credit 1,591.48
Device peak FLOPS 2.44 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Europe v6.09
windows_intelx86
Stderr
<core_client_version>6.12.34</core_client_version>
<![CDATA[
<stderr_txt>
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5096, selfPID=6328, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1856, selfPID=6324, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4672, selfPID=4444, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8796, selfPID=2024, iMonCtr=1
Model crash detected, will try to restart...
10:29:07 (5372): No heartbeat from core client for 30 sec - exiting
10:29:08 (5372): No heartbeat from core client for 30 sec - exiting
10:29:09 (5372): No heartbeat from core client for 30 sec - exiting
10:29:10 (5372): No heartbeat from core client for 30 sec - exiting
10:29:11 (5372): No heartbeat from core client for 30 sec - exiting
10:29:12 (5372): No heartbeat from core client for 30 sec - exiting
10:29:13 (5372): No heartbeat from core client for 30 sec - exiting
10:29:15 (5372): No heartbeat from core client for 30 sec - exiting
10:29:16 (5372): No heartbeat from core client for 30 sec - exiting
10:29:17 (5372): No heartbeat from core client for 30 sec - exiting
10:29:18 (5372): No heartbeat from core client for 30 sec - exiting
10:29:19 (5372): No heartbeat from core client for 30 sec - exiting
10:29:20 (5372): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8780, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6512, selfPID=5636, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5888, selfPID=3568, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5768, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5996, selfPID=5808, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1980, selfPID=5532, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
18:03:14 (2640): No heartbeat from core client for 30 sec - exiting
18:03:15 (2640): No heartbeat from core client for 30 sec - exiting
18:03:16 (2640): No heartbeat from core client for 30 sec - exiting
18:03:17 (2640): No heartbeat from core client for 30 sec - exiting
18:03:18 (2640): No heartbeat from core client for 30 sec - exiting
18:03:19 (2640): No heartbeat from core client for 30 sec - exiting
18:03:20 (2640): No heartbeat from core client for 30 sec - exiting
18:03:21 (2640): No heartbeat from core client for 30 sec - exiting
18:03:22 (2640): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
18:52:31 (5328): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
18:52:33 (5328): No heartbeat from core client for 30 sec - exiting
18:52:35 (5328): No heartbeat from core client for 30 sec - exiting
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7104, selfPID=5552, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7116, selfPID=5216, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6396, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6404, selfPID=4588, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5168, selfPID=2064, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadam3p_eu_8cw9_2001_1_008133927/dataout/atmos_restart.day after 11 attempts
cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadam3p_eu_8cw9_2001_1_008133927/dataout/region_restart.day after 11 attempts

Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO                                                                                                                                                                                           tmp/xaakm.pipe_dummy                                                            2048    

Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO                                                                                                                                                                                           tmp/xaakg.pipe_dummy                                                            2048    
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>hadam3p_eu_8cw9_2001_1_008133927_0_9.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_8cw9_2001_1_008133927_0_10.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_8cw9_2001_1_008133927_0_11.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_8cw9_2001_1_008133927_0_12.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
02 Sep 2012 04:29:48 1160009 15100800 hadam3p_eu_8cw9_2001_1_008133927_0 92,256 241,862 2.6216
01 Sep 2012 03:54:25 1160009 15100800 hadam3p_eu_8cw9_2001_1_008133927_0 80,736 214,206 2.6532
30 Aug 2012 07:50:44 1160009 15100800 hadam3p_eu_8cw9_2001_1_008133927_0 69,216 186,436 2.6935
23 Aug 2012 05:40:45 1160009 15100800 hadam3p_eu_8cw9_2001_1_008133927_0 57,696 159,954 2.7724
21 Aug 2012 08:29:50 1160009 15100800 hadam3p_eu_8cw9_2001_1_008133927_0 46,176 127,216 2.7550
19 Aug 2012 04:23:23 1160009 15100800 hadam3p_eu_8cw9_2001_1_008133927_0 34,656 94,360 2.7228
18 Aug 2012 07:33:54 1160009 15100800 hadam3p_eu_8cw9_2001_1_008133927_0 23,136 62,401 2.6971
17 Aug 2012 08:39:49 1160009 15100800 hadam3p_eu_8cw9_2001_1_008133927_0 11,616 36,738 3.1627


©2024 cpdn.org