Task 12974056

Name	hadcm3n_o6od_1940_40_007266659_2
Workunit	7464899
Created	12 Jun 2011, 23:39:28 UTC
Sent	12 Jun 2011, 23:39:46 UTC
Report deadline	12 Sep 2011, 7:06:57 UTC
Received	23 Jun 2011, 2:39:41 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1191470
Run time	7 days 5 hours 37 min 58 sec
CPU time	6 days 6 hours 5 min 47 sec
Validate state	Invalid
Credit	4,976.64
Device peak FLOPS	3.70 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 09:43:48 (1416): No heartbeat from core client for 30 sec - exiting 09:43:49 (1416): No heartbeat from core client for 30 sec - exiting 09:43:51 (1416): No heartbeat from core client for 30 sec - exiting 09:43:52 (1416): No heartbeat from core client for 30 sec - exiting 09:43:53 (1416): No heartbeat from core client for 30 sec - exiting 09:43:54 (1416): No heartbeat from core client for 30 sec - exiting 09:43:55 (1416): No heartbeat from core client for 30 sec - exiting 09:43:56 (1416): No heartbeat from core client for 30 sec - exiting 09:43:57 (1416): No heartbeat from core client for 30 sec - exiting 09:43:58 (1416): No heartbeat from core client for 30 sec - exiting 09:43:59 (1416): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:44:39 (2472): No heartbeat from core client for 30 sec - exiting 09:44:40 (2472): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:44:41 (2472): No heartbeat from core client for 30 sec - exiting 09:44:42 (2472): No heartbeat from core client for 30 sec - exiting 09:44:43 (2472): No heartbeat from core client for 30 sec - exiting 09:44:44 (2472): No heartbeat from core client for 30 sec - exiting 09:44:45 (2472): No heartbeat from core client for 30 sec - exiting 03:20:31 (1868): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:22:40 (1868): No heartbeat from core client for 30 sec - exiting 03:22:41 (1868): No heartbeat from core client for 30 sec - exiting 03:22:42 (1868): No heartbeat from core client for 30 sec - exiting 03:22:43 (1868): No heartbeat from core client for 30 sec - exiting 03:22:44 (1868): No heartbeat from core client for 30 sec - exiting 03:22:45 (1868): No heartbeat from core client for 30 sec - exiting 03:22:46 (1868): No heartbeat from core client for 30 sec - exiting 03:22:48 (1868): No heartbeat from core client for 30 sec - exiting 03:22:49 (1868): No heartbeat from core client for 30 sec - exiting 03:22:50 (1868): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 08:44:53 (1500): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:45:18 (1500): No heartbeat from core client for 30 sec - exiting 08:45:19 (1500): No heartbeat from core client for 30 sec - exiting 08:45:20 (1500): No heartbeat from core client for 30 sec - exiting 08:45:21 (1500): No heartbeat from core client for 30 sec - exiting 08:45:22 (1500): No heartbeat from core client for 30 sec - exiting 08:45:23 (1500): No heartbeat from core client for 30 sec - exiting 08:45:24 (1500): No heartbeat from core client for 30 sec - exiting 08:45:25 (1500): No heartbeat from core client for 30 sec - exiting 08:45:26 (1500): No heartbeat from core client for 30 sec - exiting 08:45:27 (1500): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 01:13:51 (3916): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:46:26 (3508): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:10:03 (2172): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:10:06 (2172): No heartbeat from core client for 30 sec - exiting 20:10:07 (2172): No heartbeat from core client for 30 sec - exiting 20:10:08 (2172): No heartbeat from core client for 30 sec - exiting 20:10:09 (2172): No heartbeat from core client for 30 sec - exiting 20:10:10 (2172): No heartbeat from core client for 30 sec - exiting 20:10:11 (2172): No heartbeat from core client for 30 sec - exiting 20:10:13 (2172): No heartbeat from core client for 30 sec - exiting 20:10:14 (2172): No heartbeat from core client for 30 sec - exiting 20:10:15 (2172): No heartbeat from core client for 30 sec - exiting 20:10:16 (2172): No heartbeat from core client for 30 sec - exiting 01:15:21 (3264): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:10:37 (2560): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:10:38 (2560): No heartbeat from core client for 30 sec - exiting 13:10:39 (2560): No heartbeat from core client for 30 sec - exiting 13:10:40 (2560): No heartbeat from core client for 30 sec - exiting 13:10:41 (2560): No heartbeat from core client for 30 sec - exiting 13:10:42 (2560): No heartbeat from core client for 30 sec - exiting 13:33:34 (1100): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3724, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3724, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3724, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 15:53:10 (3880): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:53:23 (3880): No heartbeat from core client for 30 sec - exiting 15:53:24 (3880): No heartbeat from core client for 30 sec - exiting 15:53:25 (3880): No heartbeat from core client for 30 sec - exiting 15:53:26 (3880): No heartbeat from core client for 30 sec - exiting 15:53:28 (3880): No heartbeat from core client for 30 sec - exiting 15:53:29 (3880): No heartbeat from core client for 30 sec - exiting 15:53:30 (3880): No heartbeat from core client for 30 sec - exiting 15:53:31 (3880): No heartbeat from core client for 30 sec - exiting 15:53:32 (3880): No heartbeat from core client for 30 sec - exiting 15:53:33 (3880): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2756, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2756, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2756, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
20 Jun 2011 21:37:53	972612	12974056	hadcm3n_o6od_1940_40_007266659_2	414,720	522,601	1.2601
20 Jun 2011 11:41:20	972612	12974056	hadcm3n_o6od_1940_40_007266659_2	388,800	492,002	1.2654
20 Jun 2011 00:31:41	972612	12974056	hadcm3n_o6od_1940_40_007266659_2	362,880	458,508	1.2635
19 Jun 2011 22:33:28	972612	12974056	hadcm3n_o6od_1940_40_007266659_2	336,960	424,417	1.2595
19 Jun 2011 22:04:37	972612	12974056	hadcm3n_o6od_1940_40_007266659_2	311,040	391,009	1.2571
19 Jun 2011 22:04:37	972612	12974056	hadcm3n_o6od_1940_40_007266659_2	285,120	357,646	1.2544
19 Jun 2011 22:04:37	972612	12974056	hadcm3n_o6od_1940_40_007266659_2	259,200	324,432	1.2517
17 Jun 2011 10:11:02	972612	12974056	hadcm3n_o6od_1940_40_007266659_2	233,280	291,732	1.2506
16 Jun 2011 22:09:41	972612	12974056	hadcm3n_o6od_1940_40_007266659_2	207,360	259,310	1.2505
16 Jun 2011 11:41:29	972612	12974056	hadcm3n_o6od_1940_40_007266659_2	181,440	226,787	1.2499
16 Jun 2011 00:43:13	972612	12974056	hadcm3n_o6od_1940_40_007266659_2	155,520	194,124	1.2482
15 Jun 2011 14:34:34	972612	12974056	hadcm3n_o6od_1940_40_007266659_2	129,600	161,586	1.2468
14 Jun 2011 17:35:43	972612	12974056	hadcm3n_o6od_1940_40_007266659_2	103,680	129,007	1.2443
14 Jun 2011 07:00:53	972612	12974056	hadcm3n_o6od_1940_40_007266659_2	77,760	97,138	1.2492
13 Jun 2011 20:57:06	972612	12974056	hadcm3n_o6od_1940_40_007266659_2	51,840	65,099	1.2558
13 Jun 2011 10:59:52	972612	12974056	hadcm3n_o6od_1940_40_007266659_2	25,920	32,923	1.2702