Task 13924407

Name	hadcm3n_y88j_1940_40_007682653_1
Workunit	7837740
Created	16 Jan 2012, 0:36:35 UTC
Sent	16 Jan 2012, 0:36:49 UTC
Report deadline	16 Apr 2012, 8:04:00 UTC
Received	24 Feb 2012, 1:26:32 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1166294
Run time	13 days 21 hours 33 min 53 sec
CPU time	13 days 1 hours 49 min 26 sec
Validate state	Invalid
Credit	12,130.56
Device peak FLOPS	4.26 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.60</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 12:04:58 (3148): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 15:09:16 (3084): No heartbeat from core client for 30 sec - exiting 15:09:17 (3084): No heartbeat from core client for 30 sec - exiting 15:09:18 (3084): No heartbeat from core client for 30 sec - exiting 15:09:19 (3084): No heartbeat from core client for 30 sec - exiting 15:09:20 (3084): No heartbeat from core client for 30 sec - exiting 15:09:21 (3084): No heartbeat from core client for 30 sec - exiting 15:09:22 (3084): No heartbeat from core client for 30 sec - exiting 15:09:23 (3084): No heartbeat from core client for 30 sec - exiting 15:09:24 (3084): No heartbeat from core client for 30 sec - exiting 15:09:25 (3084): No heartbeat from core client for 30 sec - exiting 15:09:26 (3084): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:22:34 (5952): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 13:18:59 (6904): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:37:04 (5132): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 17:56:10 (4700): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 18:25:27 (6512): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 08:24:44 (6856): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4492, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4492, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4492, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4492, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4492, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4492, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
24 Feb 2012 00:28:56	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	1,010,880	1,129,435	1.1173
23 Feb 2012 01:47:18	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	984,960	1,099,519	1.1163
22 Feb 2012 16:47:52	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	959,040	1,069,647	1.1153
22 Feb 2012 01:44:52	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	933,120	1,040,272	1.1148
22 Feb 2012 01:44:52	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	907,200	1,010,765	1.1142
16 Feb 2012 18:40:40	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	881,280	982,160	1.1145
12 Feb 2012 16:57:33	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	855,360	956,098	1.1178
12 Feb 2012 02:11:12	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	829,440	925,725	1.1161
11 Feb 2012 08:28:26	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	803,520	894,078	1.1127
10 Feb 2012 22:53:09	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	777,600	863,433	1.1104
10 Feb 2012 15:15:15	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	751,680	832,831	1.1080
10 Feb 2012 06:35:57	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	725,760	802,316	1.1055
09 Feb 2012 21:51:26	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	699,840	771,691	1.1027
09 Feb 2012 13:15:37	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	673,920	741,097	1.0997
09 Feb 2012 04:36:22	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	648,000	710,440	1.0964
08 Feb 2012 19:29:21	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	622,080	679,767	1.0927
08 Feb 2012 11:26:04	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	596,160	649,112	1.0888
08 Feb 2012 02:00:19	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	570,240	618,352	1.0844
07 Feb 2012 16:44:50	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	544,320	586,912	1.0782
07 Feb 2012 07:29:50	1166294	13924407	hadcm3n_y88j_1940_40_007682653_1	518,400	558,026	1.0764