Task 16203626

Name	hadcm3n_4e1b_1980_40_008405797_4
Workunit	8556653
Created	8 Jan 2014, 20:52:46 UTC
Sent	8 Jan 2014, 20:52:51 UTC
Report deadline	10 Apr 2014, 4:20:02 UTC
Received	10 Mar 2014, 16:43:39 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1113142
Run time	39 days 10 hours 5 min 37 sec
CPU time	16 days 17 hours 33 min 45 sec
Validate state	Invalid
Credit	7,776.00
Device peak FLOPS	2.61 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7672, iMonCtr=1 Model crash detected, will try to restart... 03:53:50 (7048): No heartbeat from core client for 30 sec - exiting 03:53:51 (7048): No heartbeat from core client for 30 sec - exiting 03:53:52 (7048): No heartbeat from core client for 30 sec - exiting 03:53:53 (7048): No heartbeat from core client for 30 sec - exiting 03:53:54 (7048): No heartbeat from core client for 30 sec - exiting 03:53:55 (7048): No heartbeat from core client for 30 sec - exiting 03:53:56 (7048): No heartbeat from core client for 30 sec - exiting 03:53:57 (7048): No heartbeat from core client for 30 sec - exiting 03:53:58 (7048): No heartbeat from core client for 30 sec - exiting 03:53:59 (7048): No heartbeat from core client for 30 sec - exiting 03:54:01 (7048): No heartbeat from core client for 30 sec - exiting 03:54:02 (7048): No heartbeat from core client for 30 sec - exiting 03:54:03 (7048): No heartbeat from core client for 30 sec - exiting 03:54:04 (7048): No heartbeat from core client for 30 sec - exiting 03:54:05 (7048): No heartbeat from core client for 30 sec - exiting 03:54:06 (7048): No heartbeat from core client for 30 sec - exiting 03:54:07 (7048): No heartbeat from core client for 30 sec - exiting 03:54:08 (7048): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:54:09 (7048): No heartbeat from core client for 30 sec - exiting 03:54:10 (7048): No heartbeat from core client for 30 sec - exiting 03:54:11 (7048): No heartbeat from core client for 30 sec - exiting 03:54:13 (7048): No heartbeat from core client for 30 sec - exiting Ocean Restart file copy failed on 4e1bko.dak0c20 Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7188, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7188, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7188, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7188, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7188, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7188, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish Suspended CPDN Monitor - Suspend request from BOINC... 09:13:11 (7128): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3060, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3060, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3060, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3060, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3060, iMonCtr=1 Model crash detected, will try to restart... forrtl: Access is denied. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3060, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
20 Feb 2014 08:46:39	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	648,000	1,118,602	1.7262
18 Feb 2014 23:16:24	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	622,080	1,075,784	1.7293
18 Feb 2014 12:03:13	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	596,160	1,031,524	1.7303
18 Feb 2014 12:03:13	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	570,240	988,521	1.7335
15 Feb 2014 16:39:48	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	544,320	945,806	1.7376
14 Feb 2014 20:59:52	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	518,400	892,716	1.7221
14 Feb 2014 03:04:57	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	492,480	847,140	1.7202
13 Feb 2014 09:17:32	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	466,560	801,758	1.7184
12 Feb 2014 17:49:30	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	440,640	758,179	1.7206
11 Feb 2014 22:34:45	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	414,720	713,674	1.7209
11 Feb 2014 00:44:30	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	388,800	669,425	1.7218
10 Feb 2014 02:31:02	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	362,880	625,324	1.7232
09 Feb 2014 06:10:07	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	336,960	580,477	1.7227
08 Feb 2014 11:41:24	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	311,040	535,507	1.7217
07 Feb 2014 19:02:43	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	285,120	490,786	1.7213
05 Feb 2014 23:48:40	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	259,200	445,848	1.7201
04 Feb 2014 15:36:46	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	233,280	401,342	1.7204
03 Feb 2014 01:44:52	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	207,360	355,307	1.7135
02 Feb 2014 04:42:47	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	181,440	309,972	1.7084
01 Feb 2014 07:06:29	1113142	16203626	hadcm3n_4e1b_1980_40_008405797_4	155,520	265,243	1.7055