climateprediction.net home page
Task 16380213

Task 16380213

Name hadam3p_eu_a6w1_2013_1_008569967_0
Workunit 8716479
Created 19 Mar 2014, 11:52:00 UTC
Sent 19 Mar 2014, 15:59:25 UTC
Report deadline 1 Mar 2015, 21:19:25 UTC
Received 14 Apr 2014, 9:09:21 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 1257976
Run time 2 days 18 hours 44 min 45 sec
CPU time 2 days 4 hours 51 min 28 sec
Validate state Invalid
Credit 1,194.02
Device peak FLOPS 2.40 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Europe v6.09
windows_intelx86
Stderr
<core_client_version>7.2.33</core_client_version>
<![CDATA[
<stderr_txt>
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5944, iMonCtr=2
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4276, iMonCtr=2
13:12:04 (1308): No heartbeat from core client for 30 sec - exiting
13:12:05 (1308): No heartbeat from core client for 30 sec - exiting
13:12:06 (1308): No heartbeat from core client for 30 sec - exiting
13:12:07 (1308): No heartbeat from core client for 30 sec - exiting
13:12:08 (1308): No heartbeat from core client for 30 sec - exiting
13:12:09 (1308): No heartbeat from core client for 30 sec - exiting
13:12:10 (1308): No heartbeat from core client for 30 sec - exiting
13:12:11 (1308): No heartbeat from core client for 30 sec - exiting
13:12:12 (1308): No heartbeat from core client for 30 sec - exiting
13:12:14 (1308): No heartbeat from core client for 30 sec - exiting
13:12:15 (1308): No heartbeat from core client for 30 sec - exiting
13:12:16 (1308): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3864, selfPID=3628, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3960, selfPID=3528, iMonCtr=1
Model crash detected, will try to restart...
CGlobal Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4552, iMonCtr=2
15:53:30 (3564): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3700, selfPID=3428, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3600, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3760, selfPID=3404, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1768, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3052, selfPID=3432, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3028, selfPID=3392, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3452, selfPID=3340, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2212, selfPID=3292, iMonCtr=1
Model crash detected, will try to restart...
13:45:07 (3616): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3264, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3092, selfPID=3700, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3688, selfPID=3484, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4092, selfPID=3620, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4220, selfPID=3472, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=508, selfPID=3516, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
17:42:23 (3476): No heartbeat from core client for 30 sec - exiting
17:42:24 (3476): No heartbeat from core client for 30 sec - exiting
17:42:25 (3476): No heartbeat from core client for 30 sec - exiting
17:42:26 (3476): No heartbeat from core client for 30 sec - exiting
17:42:27 (3476): No heartbeat from core client for 30 sec - exiting
17:42:28 (3476): No heartbeat from core client for 30 sec - exiting
17:42:29 (3476): No heartbeat from core client for 30 sec - exiting
17:42:30 (3476): No heartbeat from core client for 30 sec - exiting
17:42:32 (3476): No heartbeat from core client for 30 sec - exiting
17:42:33 (3476): No heartbeat from core client for 30 sec - exiting
17:42:34 (3476): No heartbeat from core client for 30 sec - exiting
17:42:35 (3476): No heartbeat from core client for 30 sec - exiting
17:42:36 (3476): No heartbeat from core client for 30 sec - exiting
17:42:37 (3476): No heartbeat from core client for 30 sec - exiting
17:42:38 (3476): No heartbeat from core client for 30 sec - exiting
17:42:39 (3476): No heartbeat from core client for 30 sec - exiting
17:42:40 (3476): No heartbeat from core client for 30 sec - exiting
17:42:41 (3476): No heartbeat from core client for 30 sec - exiting
17:42:42 (3476): No heartbeat from core client for 30 sec - exiting
17:42:44 (3476): No heartbeat from core client for 30 sec - exiting
17:42:45 (3476): No heartbeat from core client for 30 sec - exiting
17:42:46 (3476): No heartbeat from core client for 30 sec - exiting
17:42:47 (3476): No heartbeat from core client for 30 sec - exiting
17:42:48 (3476): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
18:28:56 (3804): No heartbeat from core client for 30 sec - exiting
18:28:57 (3804): No heartbeat from core client for 30 sec - exiting
18:28:58 (3804): No heartbeat from core client for 30 sec - exiting
18:28:59 (3804): No heartbeat from core client for 30 sec - exiting
18:29:00 (3804): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3408, selfPID=3292, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
21:15:57 (3432): No heartbeat from core client for 30 sec - exiting
21:15:58 (3432): No heartbeat from core client for 30 sec - exiting
21:15:59 (3432): No heartbeat from core client for 30 sec - exiting
21:16:00 (3432): No heartbeat from core client for 30 sec - exiting
21:16:01 (3432): No heartbeat from core client for 30 sec - exiting
21:16:02 (3432): No heartbeat from core client for 30 sec - exiting
21:16:03 (3432): No heartbeat from core client for 30 sec - exiting
21:16:04 (3432): No heartbeat from core client for 30 sec - exiting
21:16:05 (3432): No heartbeat from core client for 30 sec - exiting
21:16:06 (3432): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4676, selfPID=3860, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4120, selfPID=3444, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4012, selfPID=3516, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1332, selfPID=3384, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4324, selfPID=3720, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...

Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO                                                                                                                                                                                           tmp/xaakm.pipe_dummy                                                            2048    
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>hadam3p_eu_a6w1_2013_1_008569967_0_7.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_a6w1_2013_1_008569967_0_8.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_a6w1_2013_1_008569967_0_9.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_a6w1_2013_1_008569967_0_10.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_a6w1_2013_1_008569967_0_11.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_a6w1_2013_1_008569967_0_12.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
09 Apr 2014 18:17:39 1257976 16380213 hadam3p_eu_a6w1_2013_1_008569967_0 69,216 173,344 2.5044
06 Apr 2014 12:09:55 1257976 16380213 hadam3p_eu_a6w1_2013_1_008569967_0 57,696 143,930 2.4946
04 Apr 2014 10:05:13 1257976 16380213 hadam3p_eu_a6w1_2013_1_008569967_0 46,176 114,288 2.4751
26 Mar 2014 12:15:39 1257976 16380213 hadam3p_eu_a6w1_2013_1_008569967_0 34,656 85,386 2.4638
23 Mar 2014 16:10:00 1257976 16380213 hadam3p_eu_a6w1_2013_1_008569967_0 23,141 57,251 2.4740
23 Mar 2014 14:44:47 1257976 16380213 hadam3p_eu_a6w1_2013_1_008569967_0 23,136 57,062 2.4664
20 Mar 2014 10:46:15 1257976 16380213 hadam3p_eu_a6w1_2013_1_008569967_0 11,616 28,406 2.4454


©2024 cpdn.org