Task 13922727

Name	hadcm3n_yn1l_1980_40_007682058_1
Workunit	7837145
Created	15 Jan 2012, 21:08:33 UTC
Sent	15 Jan 2012, 21:08:41 UTC
Report deadline	16 Apr 2012, 4:35:52 UTC
Received	6 Mar 2012, 22:51:48 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	25 (0x00000019) Unknown error code
Computer ID	1146285
Run time	20 days 15 hours 57 min 35 sec
CPU time	20 days 11 hours 4 min 22 sec
Validate state	Invalid
Credit	6,842.88
Device peak FLOPS	2.60 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.34</core_client_version> <![CDATA[ <message> The drive cannot locate a specific area or track on the disk. (0x19) - exit code 25 (0x19) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5032, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3112, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3484, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3312, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3312, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3312, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3312, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3312, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3312, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3420, iMonCtr=1 Model crash detected, will try to restart... CCPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5028, iMonCtr=1 Model crash detected, will try to restart... 13:55:30 (5244): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1528, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1528, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1528, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4680, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4680, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1352, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
08 Feb 2012 23:20:35	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	570,240	847,452	1.4861
06 Feb 2012 02:17:40	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	544,320	809,080	1.4864
04 Feb 2012 03:35:55	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	518,400	770,098	1.4855
03 Feb 2012 15:42:52	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	492,480	731,082	1.4845
03 Feb 2012 05:34:15	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	466,560	691,980	1.4832
02 Feb 2012 17:42:44	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	440,640	653,129	1.4822
29 Jan 2012 21:41:27	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	414,720	614,471	1.4817
28 Jan 2012 16:37:17	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	388,800	575,680	1.4807
26 Jan 2012 01:29:52	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	362,880	536,930	1.4796
23 Jan 2012 22:03:00	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	336,960	497,753	1.4772
23 Jan 2012 11:05:48	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	311,040	458,905	1.4754
23 Jan 2012 00:08:20	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	285,120	419,935	1.4728
22 Jan 2012 12:57:01	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	259,200	380,729	1.4689
22 Jan 2012 02:49:56	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	233,280	341,648	1.4645
21 Jan 2012 15:52:17	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	207,360	303,020	1.4613
21 Jan 2012 04:49:41	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	181,440	264,441	1.4575
20 Jan 2012 17:52:06	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	155,520	225,730	1.4515
20 Jan 2012 07:04:25	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	129,600	187,703	1.4483
19 Jan 2012 20:11:16	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	103,680	149,592	1.4428
19 Jan 2012 09:36:42	1146285	13922727	hadcm3n_yn1l_1980_40_007682058_1	77,760	112,151	1.4423