Task 13673469

Name	hadcm3n_ylze_1940_40_007547665_0
Workunit	7744897
Created	29 Nov 2011, 16:53:50 UTC
Sent	29 Nov 2011, 23:36:17 UTC
Report deadline	29 Feb 2012, 7:03:28 UTC
Received	13 Jan 2012, 14:45:38 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1080897
Run time	12 days 7 hours 32 min 15 sec
CPU time	12 days 7 hours 32 min 15 sec
Validate state	Invalid
Credit	5,598.72
Device peak FLOPS	2.34 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.56</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2388, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2896, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4088, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 08:16:36 (2908): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:16:37 (2908): No heartbeat from core client for 30 sec - exiting 08:16:38 (2908): No heartbeat from core client for 30 sec - exiting 08:16:39 (2908): No heartbeat from core client for 30 sec - exiting 08:16:40 (2908): No heartbeat from core client for 30 sec - exiting 08:16:41 (2908): No heartbeat from core client for 30 sec - exiting 08:16:43 (2908): No heartbeat from core client for 30 sec - exiting 08:16:44 (2908): No heartbeat from core client for 30 sec - exiting 08:16:45 (2908): No heartbeat from core client for 30 sec - exiting 08:16:46 (2908): No heartbeat from core client for 30 sec - exiting 08:16:47 (2908): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2816, iMonCtr=1 Model crash detected, will try to restart... 19:19:13 (2396): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:19:17 (2396): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3152, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2996, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3520, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/ylzeko.pjf5c10 Error converting file to netcdf: dataout/ylzeko.pif5c10 Error converting file to netcdf: dataout/ylzeko.pff5c10 Error converting file to netcdf: dataout/ylzeka.phf5c10 Error converting file to netcdf: dataout/ylzeka.pgf5c10 Error converting file to netcdf: dataout/ylzeka.pef5c10 Error converting file to netcdf: dataout/ylzeka.pdf5c10 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2752, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CSignal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2712, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2712, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2712, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2712, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2712, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2712, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
09 Jan 2012 23:06:04	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	466,560	1,012,349	2.1698
07 Jan 2012 23:12:50	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	440,640	957,265	2.1724
06 Jan 2012 19:13:24	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	414,720	900,192	2.1706
05 Jan 2012 15:11:15	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	388,800	842,515	2.1670
03 Jan 2012 22:03:12	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	362,880	786,312	2.1669
01 Jan 2012 20:06:05	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	336,960	728,251	2.1612
31 Dec 2011 17:41:32	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	311,040	671,998	2.1605
30 Dec 2011 02:38:10	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	285,120	616,283	2.1615
25 Dec 2011 02:12:17	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	259,200	559,000	2.1566
24 Dec 2011 00:18:18	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	233,280	503,894	2.1600
23 Dec 2011 00:49:08	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	207,360	449,480	2.1676
20 Dec 2011 21:24:45	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	181,440	394,154	2.1724
18 Dec 2011 17:09:17	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	155,520	337,215	2.1683
16 Dec 2011 23:31:24	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	129,600	279,232	2.1546
12 Dec 2011 23:17:23	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	103,680	223,306	2.1538
08 Dec 2011 02:00:59	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	77,760	167,957	2.1599
05 Dec 2011 21:10:33	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	51,840	111,914	2.1588
04 Dec 2011 02:08:23	1080897	13673469	hadcm3n_ylze_1940_40_007547665_0	25,920	56,554	2.1819