Task 13347365

Name	hadcm3n_t0xa_1940_40_007442800_1
Workunit	7640303
Created	8 Sep 2011, 22:16:10 UTC
Sent	8 Sep 2011, 22:20:30 UTC
Report deadline	9 Dec 2011, 5:47:41 UTC
Received	9 Oct 2011, 19:36:15 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	25 (0x00000019) Unknown error code
Computer ID	1079445
Run time	14 days 8 hours 17 min 39 sec
CPU time	14 days 4 hours 21 min 1 sec
Validate state	Invalid
Credit	8,709.12
Device peak FLOPS	2.65 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The drive cannot locate a specific area or track on the disk. (0x19) - exit code 25 (0x19) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... 12:32:56 (8076): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5272, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5052, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7380, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7956, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8048, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2880, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=10676, iMonCtr=1 Model crash detected, will try to restart... 09:21:51 (5612): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9560, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 19:39:49 (7684): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CCPDN Monitor - Quit request from BOINC... COcean Restart file copy failed on t0xako.dag95c0 CPDN Monitor - Quit request from BOINC... 13:18:10 (8596): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:18:13 (8596): No heartbeat from core client for 30 sec - exiting 13:18:14 (8596): No heartbeat from core client for 30 sec - exiting 13:18:15 (8596): No heartbeat from core client for 30 sec - exiting 13:18:16 (8596): No heartbeat from core client for 30 sec - exiting 13:18:17 (8596): No heartbeat from core client for 30 sec - exiting 13:18:18 (8596): No heartbeat from core client for 30 sec - exiting 13:18:19 (8596): No heartbeat from core client for 30 sec - exiting 13:18:20 (8596): No heartbeat from core client for 30 sec - exiting Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
07 Oct 2011 09:41:37	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	725,760	1,184,919	1.6327
06 Oct 2011 20:46:31	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	699,840	1,142,299	1.6322
05 Oct 2011 22:48:46	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	673,920	1,100,939	1.6336
05 Oct 2011 01:07:30	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	648,000	1,057,692	1.6322
04 Oct 2011 03:21:26	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	622,080	1,016,287	1.6337
02 Oct 2011 22:07:35	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	596,160	974,743	1.6350
01 Oct 2011 00:05:01	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	570,240	933,239	1.6366
30 Sep 2011 04:22:03	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	544,320	890,949	1.6368
29 Sep 2011 07:57:29	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	518,400	849,572	1.6388
28 Sep 2011 09:44:23	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	492,480	805,104	1.6348
27 Sep 2011 22:16:27	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	466,560	766,187	1.6422
27 Sep 2011 01:53:28	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	440,640	723,880	1.6428
26 Sep 2011 05:07:30	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	414,720	681,756	1.6439
24 Sep 2011 08:01:03	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	388,800	638,478	1.6422
23 Sep 2011 20:11:45	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	362,880	595,979	1.6424
23 Sep 2011 00:49:33	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	336,960	554,182	1.6447
22 Sep 2011 05:05:28	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	311,040	512,334	1.6472
21 Sep 2011 07:55:50	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	285,120	470,068	1.6487
20 Sep 2011 10:29:25	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	259,200	425,714	1.6424
19 Sep 2011 22:45:33	1079445	13347365	hadcm3n_t0xa_1940_40_007442800_1	233,280	383,857	1.6455