Task 15843832

Name	hadcm3n_3fyi_1940_40_008264760_3
Workunit	8419884
Created	15 Jun 2013, 13:27:55 UTC
Sent	15 Jun 2013, 13:46:18 UTC
Report deadline	14 Sep 2013, 21:13:29 UTC
Received	15 Aug 2013, 5:49:51 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	0 (0x00000000)
Computer ID	1278257
Run time	3 days 20 hours 50 min 16 sec
CPU time	3 days 16 hours 14 min 27 sec
Validate state	Invalid
Credit	2,799.36
Device peak FLOPS	3.32 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.64</core_client_version> <![CDATA[ <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6516, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7348, iMonCtr=1 Model crash detected, will try to restart... 18:05:28 (6824): No heartbeat from core client for 30 sec - exiting 18:05:29 (6824): No heartbeat from core client for 30 sec - exiting 18:05:30 (6824): No heartbeat from core client for 30 sec - exiting 18:05:31 (6824): No heartbeat from core client for 30 sec - exiting 18:05:32 (6824): No heartbeat from core client for 30 sec - exiting 18:05:33 (6824): No heartbeat from core client for 30 sec - exiting 18:05:34 (6824): No heartbeat from core client for 30 sec - exiting 18:05:35 (6824): No heartbeat from core client for 30 sec - exiting 18:05:36 (6824): No heartbeat from core client for 30 sec - exiting 18:05:37 (6824): No heartbeat from core client for 30 sec - exiting 18:05:38 (6824): No heartbeat from core client for 30 sec - exiting 18:05:39 (6824): No heartbeat from core client for 30 sec - exiting 18:05:40 (6824): No heartbeat from core client for 30 sec - exiting 18:05:41 (6824): No heartbeat from core client for 30 sec - exiting 18:05:42 (6824): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3028, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6748, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5392, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 10:00:30 (5420): No heartbeat from core client for 30 sec - exiting 10:00:31 (5420): No heartbeat from core client for 30 sec - exiting 10:00:32 (5420): No heartbeat from core client for 30 sec - exiting 10:00:33 (5420): No heartbeat from core client for 30 sec - exiting 10:00:34 (5420): No heartbeat from core client for 30 sec - exiting 10:00:35 (5420): No heartbeat from core client for 30 sec - exiting 10:00:36 (5420): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6276, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3144, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7160, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4704, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4704, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4704, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4704, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4704, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4704, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadcm3n_3fyi_1940_40_008264760_3_1.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3n_3fyi_1940_40_008264760_3_2.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3n_3fyi_1940_40_008264760_3_3.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3n_3fyi_1940_40_008264760_3_4.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
15 Aug 2013 05:54:32	1278257	15843832	hadcm3n_3fyi_1940_40_008264760_3	233,280	310,457	1.3308
25 Jul 2013 09:25:12	1278257	15843832	hadcm3n_3fyi_1940_40_008264760_3	207,360	274,399	1.3233
23 Jul 2013 21:49:57	1278257	15843832	hadcm3n_3fyi_1940_40_008264760_3	181,440	238,502	1.3145
23 Jul 2013 19:40:22	1278257	15843832	hadcm3n_3fyi_1940_40_008264760_3	155,520	206,442	1.3274
23 Jul 2013 19:05:05	1278257	15843832	hadcm3n_3fyi_1940_40_008264760_3	129,600	174,308	1.3450
27 Jun 2013 15:01:37	1278257	15843832	hadcm3n_3fyi_1940_40_008264760_3	103,680	139,957	1.3499
25 Jun 2013 13:38:39	1278257	15843832	hadcm3n_3fyi_1940_40_008264760_3	77,760	105,849	1.3612
24 Jun 2013 09:25:12	1278257	15843832	hadcm3n_3fyi_1940_40_008264760_3	51,840	71,648	1.3821
21 Jun 2013 13:26:05	1278257	15843832	hadcm3n_3fyi_1940_40_008264760_3	25,920	35,644	1.3752