Task 13394271

Name	hadcm3n_o6kb_1980_40_007422996_2
Workunit	7620631
Created	17 Sep 2011, 17:17:35 UTC
Sent	17 Sep 2011, 17:48:30 UTC
Report deadline	18 Dec 2011, 1:15:41 UTC
Received	25 Oct 2011, 16:40:26 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1272535
Run time	12 days 22 hours 51 min 31 sec
CPU time	12 days 5 hours 44 min 54 sec
Validate state	Invalid
Credit	7,153.92
Device peak FLOPS	2.65 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.34</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 20:19:11 (1004): No heartbeat from core client for 30 sec - exiting 20:19:12 (1004): No heartbeat from core client for 30 sec - exiting 20:19:13 (1004): No heartbeat from core client for 30 sec - exiting 20:19:14 (1004): No heartbeat from core client for 30 sec - exiting 20:19:16 (1004): No heartbeat from core client for 30 sec - exiting 20:19:17 (1004): No heartbeat from core client for 30 sec - exiting 20:19:18 (1004): No heartbeat from core client for 30 sec - exiting 20:19:19 (1004): No heartbeat from core client for 30 sec - exiting 20:19:20 (1004): No heartbeat from core client for 30 sec - exiting 20:19:21 (1004): No heartbeat from core client for 30 sec - exiting 20:19:22 (1004): No heartbeat from core client for 30 sec - exiting 20:19:23 (1004): No heartbeat from core client for 30 sec - exiting 20:19:24 (1004): No heartbeat from core client for 30 sec - exiting 20:19:25 (1004): No heartbeat from core client for 30 sec - exiting 20:19:27 (1004): No heartbeat from core client for 30 sec - exiting 20:19:28 (1004): No heartbeat from core client for 30 sec - exiting 20:19:29 (1004): No heartbeat from core client for 30 sec - exiting 20:19:30 (1004): No heartbeat from core client for 30 sec - exiting 20:19:31 (1004): No heartbeat from core client for 30 sec - exiting 20:19:32 (1004): No heartbeat from core client for 30 sec - exiting 20:19:33 (1004): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:19:40 (1004):Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 10:47:04 (1356): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 09:20:45 (1920): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 10:46:14 (4100): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 02:16:21 (2064): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=492, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=492, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=492, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=492, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=492, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=492, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
19 Oct 2011 00:35:59	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	596,160	1,031,205	1.7297
18 Oct 2011 10:54:38	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	570,240	986,027	1.7291
17 Oct 2011 21:51:13	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	544,320	941,512	1.7297
17 Oct 2011 08:30:43	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	518,400	896,721	1.7298
16 Oct 2011 19:22:50	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	492,480	852,212	1.7304
16 Oct 2011 06:20:03	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	466,560	807,559	1.7309
15 Oct 2011 17:20:18	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	440,640	762,812	1.7311
15 Oct 2011 04:28:34	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	414,720	718,220	1.7318
14 Oct 2011 15:30:42	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	388,800	673,612	1.7325
14 Oct 2011 02:33:43	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	362,880	628,629	1.7323
13 Oct 2011 13:20:11	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	336,960	583,085	1.7304
13 Oct 2011 00:26:02	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	311,040	538,414	1.7310
12 Oct 2011 10:47:16	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	285,120	493,654	1.7314
11 Oct 2011 13:09:38	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	259,200	450,792	1.7392
10 Oct 2011 23:37:40	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	233,280	404,941	1.7359
04 Oct 2011 11:59:47	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	207,360	360,245	1.7373
30 Sep 2011 21:47:57	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	181,440	315,545	1.7391
30 Sep 2011 08:56:24	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	155,520	271,190	1.7438
29 Sep 2011 14:36:01	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	129,600	226,712	1.7493
28 Sep 2011 18:30:08	1109504	13394271	hadcm3n_o6kb_1980_40_007422996_2	103,680	182,214	1.7575