Task 13952923

Name	hadcm3n_yi49_1940_40_007683239_2
Workunit	7838326
Created	22 Jan 2012, 11:37:37 UTC
Sent	22 Jan 2012, 11:37:48 UTC
Report deadline	22 Apr 2012, 19:04:59 UTC
Received	25 Mar 2012, 19:04:54 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1186436
Run time	20 days 7 hours 56 min 54 sec
CPU time	20 days 7 hours 56 min 54 sec
Validate state	Invalid
Credit	11,508.48
Device peak FLOPS	2.91 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.18</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 04:05:36 (3368): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:47:58 (3408): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:14:07 (4052): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:09:38 (4284): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:24:13 (3536): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 11:29:34 (3892): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:35:42 (3796): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5024, iMonCtr=1 Model crash detected, will try to restart... 21:45:36 (3832): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4928, iMonCtr=1 Model crash detected, will try to restart... 10:20:08 (3800): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 08:30:05 (4212): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:39:01 (1784): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3316, iMonCtr=1 Model crash detected, will try to restart... 16:58:54 (4032): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:38:53 (1592): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:50:04 (4796): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:35:22 (3772): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:40:07 (4744): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4864, iMonCtr=1 Model crash detected, will try to restart... 10:44:42 (3764): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:15:09 (4884): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:18:14 (3888): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4024, iMonCtr=1 Model crash detected, will try to restart... 08:00:40 (4132): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2092, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 12:43:17 (3772): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:43:18 (3772): No heartbeat from core client for 30 sec - exiting 12:43:19 (3772): No heartbeat from core client for 30 sec - exiting 12:43:20 (3772): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 12:25:25 (4064): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:25:05 (3760): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4016, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3912, iMonCtr=1 Model crash detected, will try to restart... 12:48:52 (5020): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:54:35 (3776): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3156, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1388, iMonCtr=1 Model crash detected, will try to restart... 15:17:24 (3952): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2196, iMonCtr=1 Model crash detected, will try to restart... 11:15:13 (5000): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4012, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4012, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4012, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4012, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4012, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4012, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
23 Mar 2012 22:19:54	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	959,040	1,744,420	1.8189
22 Mar 2012 21:38:00	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	933,120	1,698,434	1.8202
21 Mar 2012 21:05:30	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	907,200	1,652,322	1.8213
20 Mar 2012 15:41:19	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	881,280	1,605,002	1.8212
19 Mar 2012 12:19:17	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	855,360	1,557,854	1.8213
14 Mar 2012 15:06:30	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	829,440	1,510,599	1.8212
12 Mar 2012 19:30:10	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	803,520	1,464,320	1.8224
11 Mar 2012 19:43:09	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	777,600	1,416,313	1.8214
10 Mar 2012 16:11:06	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	751,680	1,368,992	1.8212
08 Mar 2012 17:30:43	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	725,760	1,321,524	1.8209
07 Mar 2012 15:58:03	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	699,840	1,274,476	1.8211
06 Mar 2012 12:39:20	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	673,920	1,227,571	1.8215
24 Feb 2012 15:36:29	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	648,000	1,182,861	1.8254
23 Feb 2012 14:32:11	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	622,080	1,134,821	1.8242
22 Feb 2012 11:36:26	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	596,160	1,088,109	1.8252
21 Feb 2012 11:56:09	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	570,240	1,039,708	1.8233
19 Feb 2012 21:19:39	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	544,320	993,246	1.8247
18 Feb 2012 15:48:26	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	518,400	949,102	1.8308
17 Feb 2012 12:50:15	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	492,480	904,518	1.8367
15 Feb 2012 15:19:41	1186436	13952923	hadcm3n_yi49_1940_40_007683239_2	466,560	857,663	1.8383