Task 15802823

Name	hadcm3n_n26p_1880_40_008374255_0
Workunit	8525114
Created	29 May 2013, 21:07:16 UTC
Sent	31 May 2013, 9:13:28 UTC
Report deadline	30 Aug 2013, 16:40:39 UTC
Received	25 Jul 2013, 16:02:28 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1034737
Run time	19 days 9 hours 12 min 1 sec
CPU time	17 days 19 hours 40 min 15 sec
Validate state	Invalid
Credit	5,909.76
Device peak FLOPS	1.58 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.64</core_client_version> <![CDATA[ <message> Le périphérique ne reconnaît pas la commande. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 09:50:49 (3512): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:51:26 (1240): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:52:05 (3460): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 Suspended CPDN Monitor - Suspend request from BOINC... BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3600, iMonCtr=1 Model crash detected, will try to restart... 11:26:35 (5244): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:27:49 (4088): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:27:50 (4088): No heartbeat from core client for 30 sec - exiting 11:27:51 (4088): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3024, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exitin11:05:19 (4632): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:05:20 (4632): No heartbeat from core client for 30 sec - exiting 11:05:21 (4632): No heartbeat from core client for 30 sec - exiting 11:05:22 (4632): No heartbeat from core client for 30 sec - exiting 11:05:23 (4632): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2620, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5364, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4160, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/n26pko.pj94c10 Error converting file to netcdf: dataout/n26pko.pi94c10 Error converting file to netcdf: dataout/n26pko.pf94c10 Error converting file to netcdf: dataout/n26pka.ph94c10 Error converting file to netcdf: dataout/n26pka.pg94c10 Error converting file to netcdf: dataout/n26pka.pe94c10 Error converting file to netcdf: dataout/n26pka.pd94c10 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1264, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4340, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3972, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1952, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4092, iMonCtr=1 Model crash detected, will try to restart... 10:56:31 (5332): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:57:27 (4444): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:58:05 (5760): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:58:06 (5760): No heartbeat from core client for 30 sec - exiting 10:58:07 (5760): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2716, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2716, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2716, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2716, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2716, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2716, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
23 Jul 2013 21:50:16	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	492,480	1,482,062	3.0094
23 Jul 2013 21:15:16	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	466,560	1,404,001	3.0093
23 Jul 2013 17:45:39	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	440,640	1,323,385	3.0033
23 Jul 2013 17:45:39	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	414,720	1,243,179	2.9976
23 Jul 2013 17:45:38	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	388,800	1,163,418	2.9923
23 Jul 2013 17:45:37	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	362,880	1,082,674	2.9836
10 Jul 2013 14:37:08	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	336,960	1,005,097	2.9828
07 Jul 2013 14:22:10	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	311,040	924,471	2.9722
06 Jul 2013 05:16:18	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	285,120	843,752	2.9593
02 Jul 2013 16:06:41	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	259,200	765,974	2.9551
02 Jul 2013 10:33:15	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	233,280	695,477	2.9813
02 Jul 2013 09:45:33	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	207,360	622,779	3.0034
25 Jun 2013 16:31:08	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	181,440	546,137	3.0100
23 Jun 2013 16:05:23	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	155,520	472,786	3.0400
17 Jun 2013 14:43:20	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	129,600	399,695	3.0841
15 Jun 2013 18:05:32	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	103,680	321,976	3.1055
11 Jun 2013 18:39:34	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	77,760	240,732	3.0958
09 Jun 2013 07:19:09	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	51,840	161,641	3.1181
07 Jun 2013 10:16:08	1034737	15802823	hadcm3n_n26p_1880_40_008374255_0	25,920	81,405	3.1406