Task 15068444

Name	hadcm3n_o0eh_2100_40_008116703_0
Workunit	8271817
Created	3 Aug 2012, 20:33:17 UTC
Sent	3 Aug 2012, 20:33:27 UTC
Report deadline	3 Nov 2012, 4:00:38 UTC
Received	4 Sep 2012, 19:24:47 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	25 (0x00000019) Unknown error code
Computer ID	1079445
Run time	14 days 23 hours 5 min 6 sec
CPU time	14 days 21 hours 17 min 56 sec
Validate state	Invalid
Credit	9,642.24
Device peak FLOPS	2.64 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The drive cannot locate a specific area or track on the disk. (0x19) - exit code 25 (0x19) </message> <stderr_txt> 14:45:05 (9420): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:45:07 (9420): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9184, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3300, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3300, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9100, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9100, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9100, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5748, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5748, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6040, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6000, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6000, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 03:01:51 (5168): No heartbeat from core client for 30 sec - exiting 03:01:52 (5168): No heartbeat from core client for 30 sec - exiting 03:01:53 (5168): No heartbeat from core client for 30 sec - exiting 03:01:54 (5168): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6044, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4488, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5840, iMonCtr=1 Model crash detected, will try to restart... 06:31:17 (4760): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:31:18 (4760): No heartbeat from core client for 30 sec - exiting 06:31:19 (4760): No heartbeat from core client for 30 sec - exiting 06:31:20 (4760): No heartbeat from core client for 30 sec - exiting 06:31:21 (4760): No heartbeat from core client for 30 sec - exiting 06:31:22 (4760): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 10:52:59 (2752): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:53:00 (2752): No heartbeat from core client for 30 sec - exiting 10:53:01 (2752): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 03:01:55 (10716): No heartbeat from core client for 30 sec - exiting 03:01:56 (10716): No heartbeat from core client for 30 sec - exiting 03:01:57 (10716): No heartbeat from core client for 30 sec - exiting 03:01:58 (10716): No heartbeat from core client for 30 sec - exiting 03:01:59 (10716): No heartbeat from core client for 30 sec - exiting 03:02:00 (10716): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 09:08:12 (4608): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8440, iMonCtr=1 Model crash detected, will try to restart... 16:40:07 (3220): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:40:08 (3220): No heartbeat from core client for 30 sec - exiting 08:13:42 (3856): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 19:36:42 (3416): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:36:43 (3416): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9104, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9104, iMonCtr=1 Model crash detected, will try to restart... 20:04:41 (13876): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4112, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
01 Sep 2012 10:00:40	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	803,520	1,283,631	1.5975
31 Aug 2012 20:27:07	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	777,600	1,242,860	1.5983
29 Aug 2012 20:12:05	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	751,680	1,201,253	1.5981
28 Aug 2012 21:27:25	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	725,760	1,160,017	1.5983
28 Aug 2012 00:34:00	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	699,840	1,118,095	1.5976
27 Aug 2012 04:20:22	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	673,920	1,076,274	1.5970
26 Aug 2012 07:50:05	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	648,000	1,034,262	1.5961
25 Aug 2012 20:17:06	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	622,080	992,975	1.5962
24 Aug 2012 01:44:36	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	596,160	950,742	1.5948
23 Aug 2012 03:55:24	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	570,240	908,389	1.5930
22 Aug 2012 07:21:02	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	544,320	867,213	1.5932
21 Aug 2012 20:22:42	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	518,400	826,398	1.5941
20 Aug 2012 23:42:42	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	492,480	785,158	1.5943
20 Aug 2012 03:03:20	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	466,560	743,965	1.5946
18 Aug 2012 01:57:53	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	440,640	702,292	1.5938
16 Aug 2012 04:40:00	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	414,720	660,349	1.5923
15 Aug 2012 06:29:33	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	388,800	618,444	1.5906
14 Aug 2012 18:29:20	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	362,880	575,321	1.5854
14 Aug 2012 01:09:53	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	336,960	533,270	1.5826
13 Aug 2012 03:33:17	1079445	15068444	hadcm3n_o0eh_2100_40_008116703_0	311,040	490,349	1.5765