Task 13319424

Name	hadcm3n_t6x0_1940_40_007433418_1
Workunit	7630921
Created	31 Aug 2011, 22:00:39 UTC
Sent	31 Aug 2011, 22:05:31 UTC
Report deadline	1 Dec 2011, 5:32:42 UTC
Received	12 Oct 2011, 19:15:15 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1092069
Run time	18 days 18 hours 27 min 1 sec
CPU time	13 days 21 hours 50 min 41 sec
Validate state	Invalid
Credit	8,398.08
Device peak FLOPS	2.57 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4172, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4172, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4172, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4172, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4172, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4172, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4172, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4172, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4172, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4172, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/t6x0ko.pje4c10 Error converting file to netcdf: dataout/t6x0ko.pie4c10 Error converting file to netcdf: dataout/t6x0ko.pfe4c10 Error converting file to netcdf: dataout/t6x0ka.phe4c10 Error converting file to netcdf: dataout/t6x0ka.pge4c10 Error converting file to netcdf: dataout/t6x0ka.pee4c10 Error converting file to netcdf: dataout/t6x0ka.pde4c10 08:04:05 (4996): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:31:12 (6132): No heartbeat from core client for 30 sec - exiting 11:31:13 (6132): No heartbeat from core client for 30 sec - exiting 11:31:14 (6132): No heartbeat from core client for 30 sec - exiting 11:31:15 (6132): No heartbeat from core client for 30 sec - exiting 11:31:17 (6132): No heartbeat from core client for 30 sec - exiting 11:31:18 (6132): No heartbeat from core client for 30 sec - exiting 11:31:19 (6132): No heartbeat from core client for 30 sec - exiting 11:31:20 (6132): No heartbeat from core client for 30 sec - exiting 11:31:21 (6132): No heartbeat from core client for 30 sec - exiting 11:31:22 (6132): No heartbeat from core client for 30 sec - exiting 11:31:23 (6132): No heartbeat from core client for 30 sec - exiting 11:31:24 (6132): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:33:48 (4808): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4764, iMonCtr=1 Model crash detected, will try to restart... 09:44:42 (6092): No heartbeat from core client for 30 sec - exiting 09:44:43 (6092): No heartbeat from core client for 30 sec - exiting 09:44:44 (6092): No heartbeat from core client for 30 sec - exiting 09:44:46 (6092): No heartbeat from core client for 30 sec - exiting 09:44:47 (6092): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 19:13:11 (8812): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
12 Oct 2011 04:31:14	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	699,840	1,298,704	1.8557
11 Oct 2011 13:19:46	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	673,920	1,249,048	1.8534
10 Oct 2011 20:56:30	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	648,000	1,201,168	1.8537
10 Oct 2011 06:42:32	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	622,080	1,154,407	1.8557
09 Oct 2011 16:13:38	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	596,160	1,106,480	1.8560
09 Oct 2011 01:04:48	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	570,240	1,059,165	1.8574
08 Oct 2011 10:46:18	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	544,320	1,011,509	1.8583
07 Oct 2011 20:19:54	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	518,400	963,224	1.8581
05 Oct 2011 13:40:17	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	492,480	913,224	1.8543
04 Oct 2011 00:39:04	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	466,560	865,395	1.8548
01 Oct 2011 12:01:18	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	440,640	818,384	1.8573
30 Sep 2011 04:52:24	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	414,720	767,707	1.8511
29 Sep 2011 10:15:46	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	388,800	719,753	1.8512
28 Sep 2011 18:14:50	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	362,880	672,394	1.8529
26 Sep 2011 23:11:11	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	336,960	624,790	1.8542
23 Sep 2011 02:46:19	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	311,040	576,402	1.8531
22 Sep 2011 05:30:43	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	285,120	528,527	1.8537
21 Sep 2011 12:54:24	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	259,200	480,826	1.8550
20 Sep 2011 15:42:35	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	233,280	430,937	1.8473
20 Sep 2011 00:41:43	1092069	13319424	hadcm3n_t6x0_1940_40_007433418_1	207,360	384,773	1.8556