Task 13139765

Name	hadcm3n_yg1i_1900_40_007353648_2
Workunit	7551078
Created	15 Jul 2011, 17:13:43 UTC
Sent	15 Jul 2011, 17:15:01 UTC
Report deadline	15 Oct 2011, 0:42:12 UTC
Received	1 Nov 2011, 21:56:46 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	934926
Run time	19 days 1 hours 54 min 50 sec
CPU time	19 days 1 hours 54 min 50 sec
Validate state	Invalid
Credit	12,441.60
Device peak FLOPS	2.61 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.4.5</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> 18:34:37 (1076): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4592, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5684, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4252, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5972, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3884, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6008, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4328, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5772, iMonCtr=1 Model crash detected, will try to restart... 08:52:49 (5456): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5636, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5180, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4892, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3552, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5656, iMonCtr=1 Model crash detected, will try to restart... 21:29:58 (4256): No heartbeat from core client for 30 sec - exiting 21:29:59 (4256): No heartbeat from core client for 30 sec - exiting 21:30:00 (4256): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5180, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5180, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4388, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5440, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4676, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1632, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4808, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5652, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5576, iMonCtr=1 Model crash detected, will try to restart... 18:40:18 (3648): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:40:20 (3648): No heartbeat from core client for 30 sec - exiting 18:40:21 (3648): No heartbeat from core client for 30 sec - exiting 18:40:22 (3648): No heartbeat from core client for 30 sec - exiting 18:40:23 (3648): No heartbeat from core client for 30 sec - exiting 18:40:24 (3648): No heartbeat from core client for 30 sec - exiting 18:40:25 (3648): No heartbeat from core client for 30 sec - exiting 18:40:26 (3648): No heartbeat from core client for 30 sec - exiting 18:40:27 (3648): No heartbeat from core client for 30 sec - exiting 18:40:28 (3648): No heartbeat from core client for 30 sec - exiting 18:40:29 (3648): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5740, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5564, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/yg1iko.pjc1c10 Error converting file to netcdf: dataout/yg1iko.pic1c10 Error converting file to netcdf: dataout/yg1iko.pfc1c10 Error converting file to netcdf: dataout/yg1ika.phc1c10 Error converting file to netcdf: dataout/yg1ika.pgc1c10 Error converting file to netcdf: dataout/yg1ika.pec1c10 Error converting file to netcdf: dataout/yg1ika.pdc1c10 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=928, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4644, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/yg1iko.pjc8c10 Error converting file to netcdf: dataout/yg1iko.pic8c10 Error converting file to netcdf: dataout/yg1iko.pfc8c10 Error converting file to netcdf: dataout/yg1ika.phc8c10 Error converting file to netcdf: dataout/yg1ika.pgc8c10 Error converting file to netcdf: dataout/yg1ika.pec8c10 Error converting file to netcdf: dataout/yg1ika.pdc8c10 CPDN Monitor - Quit request from BOINC... 14:13:34 (5056): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4728, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5704, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=288, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5208, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6004, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4228, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4644, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4304, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5980, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5184, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5400, iMonCtr=1 Model crash detected, will try to restart... 14:54:02 (4528): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5876, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
01 Nov 2011 21:58:53	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	1,036,800	1,648,482	1.5900
31 Oct 2011 19:22:35	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	1,010,880	1,606,832	1.5895
31 Oct 2011 18:32:28	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	984,960	1,565,392	1.5893
31 Oct 2011 16:55:43	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	959,040	1,524,750	1.5899
31 Oct 2011 15:06:23	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	933,120	1,483,333	1.5896
31 Oct 2011 14:54:46	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	907,200	1,441,541	1.5890
31 Oct 2011 14:54:46	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	881,280	1,400,150	1.5888
18 Oct 2011 21:32:12	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	855,360	1,359,472	1.5894
17 Oct 2011 16:26:03	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	829,440	1,319,444	1.5908
15 Oct 2011 09:12:28	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	803,520	1,278,084	1.5906
11 Oct 2011 22:30:53	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	777,600	1,236,705	1.5904
10 Oct 2011 14:31:26	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	751,680	1,194,799	1.5895
07 Oct 2011 19:38:52	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	725,760	1,153,503	1.5894
04 Oct 2011 18:54:34	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	699,840	1,112,205	1.5892
30 Sep 2011 20:41:35	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	673,920	1,069,675	1.5872
29 Sep 2011 17:41:14	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	648,000	1,029,780	1.5892
26 Sep 2011 20:28:16	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	622,080	990,267	1.5919
24 Sep 2011 18:38:38	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	596,160	948,898	1.5917
22 Sep 2011 17:05:26	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	570,240	906,964	1.5905
19 Sep 2011 20:59:18	934926	13139765	hadcm3n_yg1i_1900_40_007353648_2	544,320	864,646	1.5885