Task 15445052

Name	hadcm3n_zbgw_1880_40_008247602_0
Workunit	8402726
Created	21 Nov 2012, 8:40:31 UTC
Sent	21 Nov 2012, 8:40:39 UTC
Report deadline	20 Feb 2013, 16:07:50 UTC
Received	22 Feb 2013, 8:44:16 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	25 (0x00000019) Unknown error code
Computer ID	1229738
Run time	30 days 16 hours 10 min 12 sec
CPU time	19 days 18 hours 11 min 6 sec
Validate state	Invalid
Credit	10,575.36
Device peak FLOPS	2.37 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.28</core_client_version> <![CDATA[ <message> The drive cannot locate a specific area or track on the disk. (0x19) - exit code 25 (0x19) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2712, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3124, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3504, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4660, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3580, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3416, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3696, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3988, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3356, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3400, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2148, iMonCtr=1 Model crash detected, will try to restart... 18:00:02 (3312): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5840, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3940, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3816, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3724, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3892, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3768, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2932, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3680, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3880, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3348, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3748, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3876, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3648, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3784, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3168, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CCalled boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
20 Feb 2013 11:03:31	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	881,280	1,676,749	1.9026
17 Feb 2013 11:12:25	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	855,360	1,627,477	1.9027
16 Feb 2013 07:23:39	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	829,440	1,578,272	1.9028
14 Feb 2013 14:35:28	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	803,520	1,529,337	1.9033
12 Feb 2013 12:10:43	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	777,600	1,480,639	1.9041
10 Feb 2013 08:26:51	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	751,680	1,431,893	1.9049
08 Feb 2013 15:35:47	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	725,760	1,384,954	1.9083
05 Feb 2013 13:25:19	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	699,840	1,332,488	1.9040
03 Feb 2013 05:36:57	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	673,920	1,283,329	1.9043
01 Feb 2013 09:26:58	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	648,000	1,233,974	1.9043
31 Jan 2013 08:40:38	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	622,080	1,185,426	1.9056
27 Jan 2013 12:41:46	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	596,160	1,135,933	1.9054
24 Jan 2013 10:01:20	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	570,240	1,088,300	1.9085
20 Jan 2013 08:39:37	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	544,320	1,038,776	1.9084
19 Jan 2013 03:34:02	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	518,400	990,145	1.9100
15 Jan 2013 08:53:38	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	492,480	941,619	1.9120
13 Jan 2013 01:58:24	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	466,560	891,923	1.9117
11 Jan 2013 07:54:08	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	440,640	841,279	1.9092
07 Jan 2013 11:21:33	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	414,720	791,214	1.9078
05 Jan 2013 22:43:48	1229738	15445052	hadcm3n_zbgw_1880_40_008247602_0	388,800	743,084	1.9112