Task 13415007

Name	hadcm3n_u4gt_1980_40_007460187_3
Workunit	7657690
Created	23 Sep 2011, 7:13:59 UTC
Sent	23 Sep 2011, 7:16:57 UTC
Report deadline	23 Dec 2011, 14:44:08 UTC
Received	26 Nov 2011, 0:01:30 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1164541
Run time	26 days 12 hours 49 min 52 sec
CPU time	23 days 17 hours 33 min 42 sec
Validate state	Invalid
Credit	7,464.96
Device peak FLOPS	2.67 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.34</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2092, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2092, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2608, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2576, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3712, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3712, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2628, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2076, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3540, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5688, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3908, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 06:49:19 (7108): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2416, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2348, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Called boinc_finish CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2480, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4004, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=320, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2344, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2476, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2476, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2476, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2476, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2476, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2476, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
03 Nov 2011 00:41:44	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	622,080	1,394,231	2.2412
31 Oct 2011 19:44:16	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	596,160	1,334,762	2.2389
31 Oct 2011 18:25:50	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	570,240	1,278,222	2.2416
31 Oct 2011 17:11:24	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	544,320	1,221,071	2.2433
31 Oct 2011 16:43:08	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	518,400	1,162,863	2.2432
31 Oct 2011 14:55:30	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	492,480	1,106,374	2.2465
31 Oct 2011 14:55:30	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	466,560	1,048,279	2.2468
31 Oct 2011 14:55:30	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	440,640	990,473	2.2478
19 Oct 2011 01:32:16	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	414,720	932,432	2.2483
17 Oct 2011 12:09:44	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	388,800	879,580	2.2623
16 Oct 2011 15:22:22	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	362,880	822,170	2.2657
14 Oct 2011 20:19:30	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	336,960	763,538	2.2660
13 Oct 2011 12:24:14	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	311,040	707,453	2.2745
12 Oct 2011 07:24:01	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	285,120	650,993	2.2832
10 Oct 2011 13:55:48	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	259,200	596,974	2.3031
08 Oct 2011 03:22:05	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	233,280	540,884	2.3186
07 Oct 2011 02:33:33	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	207,360	484,020	2.3342
05 Oct 2011 10:52:17	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	181,440	422,172	2.3268
04 Oct 2011 13:00:44	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	155,520	358,941	2.3080
03 Oct 2011 09:45:15	1164541	13415007	hadcm3n_u4gt_1980_40_007460187_3	129,600	298,770	2.3053