Task 16290000

Name	hadcm3n_7nx7_1980_40_008442286_3
Workunit	8593142
Created	11 Feb 2014, 18:26:10 UTC
Sent	11 Feb 2014, 18:26:28 UTC
Report deadline	15 Aug 2023, 23:46:28 UTC
Received	14 May 2014, 2:07:49 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	0 (0x00000000)
Computer ID	1241124
Run time	44 days 14 hours 34 min 30 sec
CPU time	42 days 9 hours 19 min 7 sec
Validate state	Invalid
Credit	11,197.44
Device peak FLOPS	1.33 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5664, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5664, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5308, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2728, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1116, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3652, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3652, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2504, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=31760, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6116, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6116, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6116, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6116, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6116, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3716, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3716, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3716, iMonCtr=1 Model crash detected, will try to restart... 06:24:09 (3744): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:15:28 (5836): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7100, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2224, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6068, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6096, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4980, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=10740, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5964, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5964, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 19:07:18 (5868): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4716, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3704, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2368, iMonCtr=1 Model crash detected, will try to restart... 06:14:06 (5008): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6044, iMonCtr=1 Model crash detected, will try to restart... 07:32:57 (5892): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=12972, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5640, iMonCtr=1 Model crash detected, will try to restart... 06:33:01 (4168): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6108, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=14832, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5948, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5716, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2836, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2836, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2836, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2836, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2836, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2836, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadcm3n_7nx7_1980_40_008442286_3_4.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> </message> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
12 May 2014 08:32:48	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	933,120	3,614,920	3.8740
11 May 2014 03:05:11	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	907,200	3,514,362	3.8739
08 May 2014 09:28:56	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	881,280	3,414,599	3.8746
06 May 2014 03:00:11	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	855,360	3,314,497	3.8750
04 May 2014 07:00:24	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	829,440	3,213,340	3.8741
03 May 2014 01:36:03	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	803,520	3,113,374	3.8747
27 Apr 2014 03:12:36	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	777,600	3,008,948	3.8695
21 Apr 2014 07:55:03	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	751,680	2,898,911	3.8566
19 Apr 2014 20:44:58	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	725,760	2,786,928	3.8400
12 Apr 2014 14:39:38	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	699,840	2,679,837	3.8292
06 Apr 2014 14:08:32	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	673,920	2,577,031	3.8239
02 Apr 2014 03:21:00	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	648,000	2,475,420	3.8201
30 Mar 2014 09:07:02	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	622,080	2,371,522	3.8122
27 Mar 2014 05:58:51	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	596,160	2,269,406	3.8067
24 Mar 2014 08:52:47	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	570,240	2,172,370	3.8096
23 Mar 2014 05:23:22	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	544,320	2,076,644	3.8151
20 Mar 2014 11:46:26	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	518,400	1,980,523	3.8205
17 Mar 2014 06:20:02	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	492,480	1,884,796	3.8272
16 Mar 2014 00:05:40	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	466,560	1,786,230	3.8285
14 Mar 2014 03:38:04	1241124	16290000	hadcm3n_7nx7_1980_40_008442286_3	440,640	1,685,787	3.8258