Task 13324469

Name	hadcm3n_o348_1940_40_007435520_0
Workunit	7633023
Created	1 Sep 2011, 17:43:18 UTC
Sent	2 Sep 2011, 8:32:01 UTC
Report deadline	2 Dec 2011, 15:59:12 UTC
Received	25 Sep 2011, 7:27:32 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	985784
Run time	11 days 7 hours 11 min 2 sec
CPU time	9 days 3 hours 36 min 59 sec
Validate state	Invalid
Credit	4,976.64
Device peak FLOPS	2.56 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.18</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2944, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2944, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3160, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2992, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2992, iMonCtr=1 Model crash detected, will try to restart... 16:01:40 (2028): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:01:42 (2028): No heartbeat from core client for 30 sec - exiting 16:01:43 (2028): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 16:01:47 (3700): No heartbeat from core client for 30 sec - exiting 16:01:48 (3700): No heartbeat from core client for 30 sec - exiting 16:01:49 (3700): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3848, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2980, iMonCtr=1 Model crash detected, will try to restart... 16:01:41 (2948): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2932, iMonCtr=1 Model crash detected, will try to restart... 16:01:39 (2940): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2444, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2444, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2444, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2444, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
16 Sep 2011 19:11:47	985784	13324469	hadcm3n_o348_1940_40_007435520_0	414,720	765,163	1.8450
15 Sep 2011 18:09:08	985784	13324469	hadcm3n_o348_1940_40_007435520_0	388,800	716,890	1.8439
14 Sep 2011 18:57:13	985784	13324469	hadcm3n_o348_1940_40_007435520_0	362,880	668,387	1.8419
13 Sep 2011 20:05:59	985784	13324469	hadcm3n_o348_1940_40_007435520_0	336,960	619,860	1.8396
12 Sep 2011 21:26:07	985784	13324469	hadcm3n_o348_1940_40_007435520_0	311,040	571,785	1.8383
11 Sep 2011 20:54:30	985784	13324469	hadcm3n_o348_1940_40_007435520_0	285,120	523,471	1.8360
11 Sep 2011 07:29:47	985784	13324469	hadcm3n_o348_1940_40_007435520_0	259,200	475,085	1.8329
10 Sep 2011 12:31:32	985784	13324469	hadcm3n_o348_1940_40_007435520_0	233,280	426,989	1.8304
09 Sep 2011 23:00:58	985784	13324469	hadcm3n_o348_1940_40_007435520_0	207,360	379,638	1.8308
09 Sep 2011 09:28:01	985784	13324469	hadcm3n_o348_1940_40_007435520_0	181,440	332,150	1.8306
08 Sep 2011 10:58:08	985784	13324469	hadcm3n_o348_1940_40_007435520_0	155,520	284,818	1.8314
07 Sep 2011 12:32:43	985784	13324469	hadcm3n_o348_1940_40_007435520_0	129,600	237,778	1.8347
06 Sep 2011 09:36:11	985784	13324469	hadcm3n_o348_1940_40_007435520_0	103,680	190,826	1.8405
04 Sep 2011 20:23:02	985784	13324469	hadcm3n_o348_1940_40_007435520_0	77,760	143,360	1.8436
04 Sep 2011 00:10:52	985784	13324469	hadcm3n_o348_1940_40_007435520_0	51,840	96,076	1.8533
03 Sep 2011 04:29:13	985784	13324469	hadcm3n_o348_1940_40_007435520_0	25,920	48,181	1.8588