climateprediction.net home page
Task 15843832

Task 15843832

Name hadcm3n_3fyi_1940_40_008264760_3
Workunit 8419884
Created 15 Jun 2013, 13:27:55 UTC
Sent 15 Jun 2013, 13:46:18 UTC
Report deadline 14 Sep 2013, 21:13:29 UTC
Received 15 Aug 2013, 5:49:51 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 1278257
Run time 3 days 20 hours 50 min 16 sec
CPU time 3 days 16 hours 14 min 27 sec
Validate state Invalid
Credit 2,799.36
Device peak FLOPS 3.32 GFLOPS
Application version UK Met Office Coupled Model Full Resolution Ocean v6.07
windows_intelx86
Stderr
<core_client_version>7.0.64</core_client_version>
<![CDATA[
<stderr_txt>
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6516, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7348, iMonCtr=1
Model crash detected, will try to restart...
18:05:28 (6824): No heartbeat from core client for 30 sec - exiting
18:05:29 (6824): No heartbeat from core client for 30 sec - exiting
18:05:30 (6824): No heartbeat from core client for 30 sec - exiting
18:05:31 (6824): No heartbeat from core client for 30 sec - exiting
18:05:32 (6824): No heartbeat from core client for 30 sec - exiting
18:05:33 (6824): No heartbeat from core client for 30 sec - exiting
18:05:34 (6824): No heartbeat from core client for 30 sec - exiting
18:05:35 (6824): No heartbeat from core client for 30 sec - exiting
18:05:36 (6824): No heartbeat from core client for 30 sec - exiting
18:05:37 (6824): No heartbeat from core client for 30 sec - exiting
18:05:38 (6824): No heartbeat from core client for 30 sec - exiting
18:05:39 (6824): No heartbeat from core client for 30 sec - exiting
18:05:40 (6824): No heartbeat from core client for 30 sec - exiting
18:05:41 (6824): No heartbeat from core client for 30 sec - exiting
18:05:42 (6824): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3028, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6748, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
CPDN Monitor - Quit request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5392, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
10:00:30 (5420): No heartbeat from core client for 30 sec - exiting
10:00:31 (5420): No heartbeat from core client for 30 sec - exiting
10:00:32 (5420): No heartbeat from core client for 30 sec - exiting
10:00:33 (5420): No heartbeat from core client for 30 sec - exiting
10:00:34 (5420): No heartbeat from core client for 30 sec - exiting
10:00:35 (5420): No heartbeat from core client for 30 sec - exiting
10:00:36 (5420): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6276, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Atmos Hold Restart file rename failed on atmos_restart.hold
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3144, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7160, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4704, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4704, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4704, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4704, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4704, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4704, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>hadcm3n_3fyi_1940_40_008264760_3_1.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadcm3n_3fyi_1940_40_008264760_3_2.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadcm3n_3fyi_1940_40_008264760_3_3.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadcm3n_3fyi_1940_40_008264760_3_4.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
15 Aug 2013 05:54:32 1278257 15843832 hadcm3n_3fyi_1940_40_008264760_3 233,280 310,457 1.3308
25 Jul 2013 09:25:12 1278257 15843832 hadcm3n_3fyi_1940_40_008264760_3 207,360 274,399 1.3233
23 Jul 2013 21:49:57 1278257 15843832 hadcm3n_3fyi_1940_40_008264760_3 181,440 238,502 1.3145
23 Jul 2013 19:40:22 1278257 15843832 hadcm3n_3fyi_1940_40_008264760_3 155,520 206,442 1.3274
23 Jul 2013 19:05:05 1278257 15843832 hadcm3n_3fyi_1940_40_008264760_3 129,600 174,308 1.3450
27 Jun 2013 15:01:37 1278257 15843832 hadcm3n_3fyi_1940_40_008264760_3 103,680 139,957 1.3499
25 Jun 2013 13:38:39 1278257 15843832 hadcm3n_3fyi_1940_40_008264760_3 77,760 105,849 1.3612
24 Jun 2013 09:25:12 1278257 15843832 hadcm3n_3fyi_1940_40_008264760_3 51,840 71,648 1.3821
21 Jun 2013 13:26:05 1278257 15843832 hadcm3n_3fyi_1940_40_008264760_3 25,920 35,644 1.3752


©2024 cpdn.org