Task 17831334

Name	hadcm3n_sckf_1940_40_009113934_4
Workunit	9244270
Created	25 Jan 2015, 1:13:32 UTC
Sent	25 Jan 2015, 2:11:49 UTC
Report deadline	26 Apr 2015, 9:39:00 UTC
Received	20 Feb 2015, 0:10:51 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1352765
Run time	8 days 9 hours 31 min 48 sec
CPU time	8 days 7 hours 15 min 4 sec
Validate state	Invalid
Credit	8,398.08
Device peak FLOPS	3.48 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.4.36</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 18:23:41 (2420): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 20:22:51 (6332): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:38:51 (1228): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 13:32:45 (9284): No heartbeat from core client for 30 sec - exiting 13:32:46 (9284): No heartbeat from core client for 30 sec - exiting 13:32:47 (9284): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 12:07:37 (5844): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 12:08:54 (7580): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 19:56:31 (9772): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 11:06:25 (8824): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 09:17:02 (8540): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 10:34:09 (15496): No heartbeat from core client for 30 sec - exiting 10:34:11 (15496): No heartbeat from core client for 30 sec - exiting 10:34:12 (15496): No heartbeat from core client for 30 sec - exiting 10:34:13 (15496): No heartbeat from core client for 30 sec - exiting 10:34:14 (15496): No heartbeat from core client for 30 sec - exiting 10:34:15 (15496): No heartbeat from core client for 30 sec - exiting 10:34:16 (15496): No heartbeat from core client for 30 sec - exiting 10:34:17 (15496): No heartbeat from core client for 30 sec - exiting 10:34:18 (15496): No heartbeat from core client for 30 sec - exiting 10:34:19 (15496): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 21:38:18 (11768): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:35:34 (8124): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 19:06:29 (13360): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5216, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5216, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5216, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5216, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5216, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5216, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6872, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
18 Feb 2015 18:19:37	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	699,840	699,374	0.9993
17 Feb 2015 22:42:29	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	673,920	675,205	1.0019
17 Feb 2015 03:30:42	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	648,000	650,793	1.0043
16 Feb 2015 19:28:14	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	622,080	626,462	1.0070
15 Feb 2015 03:14:08	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	596,160	602,537	1.0107
13 Feb 2015 19:02:00	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	570,240	578,055	1.0137
13 Feb 2015 02:22:51	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	544,320	552,990	1.0159
12 Feb 2015 03:22:12	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	518,400	527,242	1.0171
10 Feb 2015 17:09:10	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	492,480	494,370	1.0038
09 Feb 2015 19:55:32	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	466,560	460,299	0.9866
08 Feb 2015 17:25:59	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	440,640	426,742	0.9685
06 Feb 2015 19:47:59	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	414,720	402,171	0.9697
05 Feb 2015 02:55:57	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	388,800	376,433	0.9682
03 Feb 2015 18:02:28	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	362,880	350,783	0.9667
03 Feb 2015 01:36:20	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	336,960	324,254	0.9623
02 Feb 2015 18:34:21	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	311,040	299,037	0.9614
01 Feb 2015 23:40:40	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	285,120	275,233	0.9653
01 Feb 2015 17:34:44	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	259,200	252,643	0.9747
29 Jan 2015 19:02:39	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	233,280	227,759	0.9763
28 Jan 2015 22:02:15	1352765	17831334	hadcm3n_sckf_1940_40_009113934_4	207,360	201,859	0.9735