Task 13549472

Name	hadcm3n_y916_1900_40_007521892_0
Workunit	7719367
Created	28 Oct 2011, 13:15:09 UTC
Sent	2 Nov 2011, 1:04:32 UTC
Report deadline	1 Feb 2012, 8:31:43 UTC
Received	4 Dec 2011, 3:49:39 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	775427
Run time	18 days 10 hours 13 min 19 sec
CPU time	17 days 12 hours 26 min 2 sec
Validate state	Invalid
Credit	9,331.20
Device peak FLOPS	2.31 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.34</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 19:33:06 (968): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 12:11:50 (5444): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 22:10:58 (2284): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=688, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 16:01:10 (3124): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4640, iMonCtr=1 Model crash detected, will try to restart... 12:48:36 (5104): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CSuspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4176, iMonCtr=1 Model crash detected, will try to restart... 10:38:12 (4212): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 18:54:47 (6016): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7680, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7680, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7680, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7680, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7680, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7680, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7680, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7680, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7680, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7680, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/y916ko.pjb5c10 Error converting file to netcdf: dataout/y916ko.pib5c10 Error converting file to netcdf: dataout/y916ko.pfb5c10 Error converting file to netcdf: dataout/y916ka.phb5c10 Error converting file to netcdf: dataout/y916ka.pgb5c10 Error converting file to netcdf: dataout/y916ka.peb5c10 Error converting file to netcdf: dataout/y916ka.pdb5c10 10:01:25 (3836): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 17:28:46 (4768): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2812, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 21:59:19 (5724): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3760, iMonCtr=1 Model crash detected, will try to restart... 08:24:10 (6064): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:53:41 (3020): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 09:43:51 (5340): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4912, iMonCtr=1 Model crash detected, will try to restart... 16:17:35 (5912): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4232, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5304, iMonCtr=1 Model crash detected, will try to restart... 14:43:32 (5164): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 19:48:40 (4360): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... cpdnmonitor: cannot open input file C:\BOINC/projects/climateprediction.net/hadcm3n_y916_1900_40_007521892/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\BOINC/projects/climateprediction.net/hadcm3n_y916_1900_40_007521892/dataout/ocean_restart.day after 11 attempts Model crashed: READ_FLH: I/O error tmp/pipe_dummy 2048 cpdnmonitor: cannot open input file C:\BOINC/projects/climateprediction.net/hadcm3n_y916_1900_40_007521892/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\BOINC/projects/climateprediction.net/hadcm3n_y916_1900_40_007521892/dataout/ocean_restart.day after 11 attempts Model crashed: READ_FLH: I/O error tmp/pipe_dummy 2048 cpdnmonitor: cannot open input file C:\BOINC/projects/climateprediction.net/hadcm3n_y916_1900_40_007521892/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\BOINC/projects/climateprediction.net/hadcm3n_y916_1900_40_007521892/dataout/ocean_restart.day after 11 attempts Model crashed: READ_FLH: I/O error tmp/pipe_dummy 2048 cpdnmonitor: cannot open input file C:\BOINC/projects/climateprediction.net/hadcm3n_y916_1900_40_007521892/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\BOINC/projects/climateprediction.net/hadcm3n_y916_1900_40_007521892/dataout/ocean_restart.day after 11 attempts Model crashed: READ_FLH: I/O error tmp/pipe_dummy 2048 cpdnmonitor: cannot open input file C:\BOINC/projects/climateprediction.net/hadcm3n_y916_1900_40_007521892/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\BOINC/projects/climateprediction.net/hadcm3n_y916_1900_40_007521892/dataout/ocean_restart.day after 11 attempts Model crashed: READ_FLH: I/O error tmp/pipe_dummy 2048 cpdnmonitor: cannot open input file C:\BOINC/projects/climateprediction.net/hadcm3n_y916_1900_40_007521892/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\BOINC/projects/climateprediction.net/hadcm3n_y916_1900_40_007521892/dataout/ocean_restart.day after 11 attempts Model crashed: READ_FLH: I/O error tmp/pipe_dummy 2048 Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
04 Dec 2011 02:53:32	775427	13549472	hadcm3n_y916_1900_40_007521892_0	777,600	1,513,622	1.9465
03 Dec 2011 01:24:31	775427	13549472	hadcm3n_y916_1900_40_007521892_0	751,680	1,466,196	1.9506
02 Dec 2011 02:00:09	775427	13549472	hadcm3n_y916_1900_40_007521892_0	725,760	1,415,237	1.9500
01 Dec 2011 00:32:49	775427	13549472	hadcm3n_y916_1900_40_007521892_0	699,840	1,364,033	1.9491
30 Nov 2011 00:23:59	775427	13549472	hadcm3n_y916_1900_40_007521892_0	673,920	1,312,938	1.9482
28 Nov 2011 23:17:49	775427	13549472	hadcm3n_y916_1900_40_007521892_0	648,000	1,262,736	1.9487
27 Nov 2011 22:32:15	775427	13549472	hadcm3n_y916_1900_40_007521892_0	622,080	1,210,942	1.9466
27 Nov 2011 08:40:42	775427	13549472	hadcm3n_y916_1900_40_007521892_0	596,160	1,160,407	1.9465
26 Nov 2011 07:54:21	775427	13549472	hadcm3n_y916_1900_40_007521892_0	570,240	1,108,714	1.9443
25 Nov 2011 16:47:58	775427	13549472	hadcm3n_y916_1900_40_007521892_0	544,320	1,059,219	1.9459
24 Nov 2011 18:09:13	775427	13549472	hadcm3n_y916_1900_40_007521892_0	518,400	1,009,635	1.9476
23 Nov 2011 17:44:37	775427	13549472	hadcm3n_y916_1900_40_007521892_0	492,480	960,149	1.9496
22 Nov 2011 18:26:35	775427	13549472	hadcm3n_y916_1900_40_007521892_0	466,560	910,809	1.9522
21 Nov 2011 20:09:58	775427	13549472	hadcm3n_y916_1900_40_007521892_0	440,640	860,898	1.9537
20 Nov 2011 17:04:43	775427	13549472	hadcm3n_y916_1900_40_007521892_0	414,720	811,358	1.9564
19 Nov 2011 17:10:56	775427	13549472	hadcm3n_y916_1900_40_007521892_0	388,800	761,740	1.9592
18 Nov 2011 18:42:30	775427	13549472	hadcm3n_y916_1900_40_007521892_0	362,880	712,593	1.9637
17 Nov 2011 18:42:55	775427	13549472	hadcm3n_y916_1900_40_007521892_0	336,960	661,783	1.9640
16 Nov 2011 16:57:46	775427	13549472	hadcm3n_y916_1900_40_007521892_0	311,040	610,605	1.9631
15 Nov 2011 18:13:20	775427	13549472	hadcm3n_y916_1900_40_007521892_0	285,120	560,441	1.9656