Task 12735924

Name	hadcm3n_o1ko_1900_40_007197371_0
Workunit	7395651
Created	28 Mar 2011, 14:00:41 UTC
Sent	1 Apr 2011, 16:23:02 UTC
Report deadline	1 Jul 2011, 23:50:13 UTC
Received	24 Jul 2011, 8:40:12 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1064484
Run time	30 days 13 hours 37 min 20 sec
CPU time	23 days 17 hours 19 min 59 sec
Validate state	Invalid
Credit	12,441.60
Device peak FLOPS	1.94 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.18</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7064, iMonCtr=1 Model crash detected, will try to restart... 20:33:05 (8040): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:34:41 (6860): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:34:42 (6860): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 12:32:30 (2036): No heartbeat from core client for 30 sec - exiting 12:32:31 (2036): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6360, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7244, iMonCtr=1 Model crash detected, will try to restart... CCPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6984, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 CPDN Monitor - Quit request from BOINC... 07:00:51 (700): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6536, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3616, iMonCtr=1 Model crash detected, will try to restart... 20:59:47 (6816): No heartbeat from core client for 30 sec - exiting 20:59:48 (6816): No heartbeat from core client for 30 sec - exiting 20:59:49 (6816): No heartbeat from core client for 30 sec - exiting 20:59:50 (6816): No heartbeat from core client for 30 sec - exiting 20:59:51 (6816): No heartbeat from core client for 30 sec - exiting 20:59:52 (6816): No heartbeat from core client for 30 sec - exiting 20:59:53 (6816): No heartbeat from core client for 30 sec - exiting 20:59:54 (6816): No heartbeat from core client for 30 sec - exiting 20:59:55 (6816): No heartbeat from core client for 30 sec - exiting 20:59:56 (6816): No heartbeat from core client for 30 sec - exiting 20:59:57 (6816): No heartbeat from core client for 30 sec - exiting 20:59:58 (6816): No heartbeat from core client for 30 sec - exiting 20:59:59 (6816): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6392, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6392, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6392, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6392, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/o1koko.pjc1c10 Error converting file to netcdf: dataout/o1koko.pic1c10 Error converting file to netcdf: dataout/o1koko.pfc1c10 Error converting file to netcdf: dataout/o1koka.phc1c10 Error converting file to netcdf: dataout/o1koka.pgc1c10 Error converting file to netcdf: dataout/o1koka.pec1c10 Error converting file to netcdf: dataout/o1koka.pdc1c10 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7024, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1328, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4556, iMonCtr=1 Model crash detected, will try to restart... 08:31:52 (5568): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:34:56 (4216): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:37:57 (6320): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:40:56 (8764): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:44:58 (6848): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:48:01 (1084): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:52:00 (6200): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:55:01 (3260): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5472, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6664, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6664, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6664, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6664, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3412, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7260, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7260, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6772, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 23:43:49 (7420): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:45:25 (8960): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:47:58 (2084): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:49:30 (3552): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:52:03 (5112): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:53:35 (9180): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:53:36 (9180): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7488, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4048, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6292, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5436, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
25 Jul 2011 21:55:08	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	1,036,800	2,049,590	1.9768
25 Jul 2011 20:26:45	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	1,010,880	2,000,810	1.9793
25 Jul 2011 15:44:12	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	984,960	1,951,145	1.9809
08 Jul 2011 05:22:27	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	959,040	1,901,851	1.9831
07 Jul 2011 16:04:43	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	933,120	1,853,515	1.9864
04 Jul 2011 21:47:50	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	907,200	1,803,465	1.9879
04 Jul 2011 03:19:00	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	881,280	1,754,217	1.9905
03 Jul 2011 00:05:31	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	855,360	1,704,334	1.9925
01 Jul 2011 10:49:33	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	829,440	1,654,658	1.9949
29 Jun 2011 05:21:13	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	803,520	1,605,091	1.9976
26 Jun 2011 04:05:07	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	777,600	1,553,018	1.9972
23 Jun 2011 05:16:02	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	751,680	1,500,241	1.9959
21 Jun 2011 18:29:13	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	725,760	1,450,649	1.9988
19 Jun 2011 22:06:37	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	699,840	1,400,638	2.0014
19 Jun 2011 22:06:37	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	673,920	1,352,341	2.0067
16 Jun 2011 09:10:49	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	648,000	1,302,528	2.0101
13 Jun 2011 06:43:28	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	622,080	1,250,304	2.0099
13 Jun 2011 01:16:43	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	596,160	1,197,947	2.0094
10 Jun 2011 23:45:08	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	570,240	1,145,876	2.0095
08 Jun 2011 01:54:55	1064484	12735924	hadcm3n_o1ko_1900_40_007197371_0	544,320	1,090,775	2.0039