Task 15886071

Name	hadcm3n_4gzi_1980_40_008399134_0
Workunit	8549990
Created	8 Jul 2013, 19:48:16 UTC
Sent	11 Jul 2013, 5:01:05 UTC
Report deadline	10 Oct 2013, 12:28:16 UTC
Received	3 Jan 2014, 4:35:15 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	25 (0x00000019) Unknown error code
Computer ID	1218845
Run time	20 days 0 hours 17 min 20 sec
CPU time	19 days 14 hours 27 min 36 sec
Validate state	Invalid
Credit	8,398.08
Device peak FLOPS	2.66 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.2.33</core_client_version> <![CDATA[ <message> The drive cannot locate a specific area or track on the disk. (0x19) - exit code 25 (0x19) </message> <stderr_txt> C17:30:50 (4804): No heartbeat from core client for 30 sec - exiting 17:30:51 (4804): No heartbeat from core client for 30 sec - exiting 17:30:52 (4804): No heartbeat from core client for 30 sec - exiting 17:30:53 (4804): No heartbeat from core client for 30 sec - exiting 17:30:54 (4804): No heartbeat from core client for 30 sec - exiting 17:30:55 (4804): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2172, iMonCtr=1 Model crash detected, will try to restart... 11:21:13 (5596): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2952, iMonCtr=1 Model crash detected, will try to restart... 15:12:50 (4428): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5544, iMonCtr=1 Model crash detected, will try to restart... 11:09:33 (1528): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4980, iMonCtr=1 Model crash detected, will try to restart... 19:46:11 (5504): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5000, iMonCtr=1 Model crash detected, will try to restart... 09:30:56 (4172): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:49:30 (5012): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:56:49 (4624): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4976, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4976, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5516, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5252, iMonCtr=1 Model crash detected, will try to restart... 22:39:28 (5316): No heartbeat from core client for 30 sec - exiting 22:39:29 (5316): No heartbeat from core client for 30 sec - exiting 22:39:30 (5316): No heartbeat from core client for 30 sec - exiting 22:39:31 (5316): No heartbeat from core client for 30 sec - exiting 22:39:32 (5316): No heartbeat from core client for 30 sec - exiting 22:39:33 (5316): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:38:39 (3084): No heartbeat from core client for 30 sec - exiting 08:38:40 (3084): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5300, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6000, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5960, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 09:09:07 (4824): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5892, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3764, iMonCtr=1 Model crash detected, will try to restart... 22:21:51 (868): No heartbeat from core client for 30 sec - exiting 22:21:52 (868): No heartbeat from core client for 30 sec - exiting 22:21:53 (868): No heartbeat from core client for 30 sec - exiting 22:21:54 (868): No heartbeat from core client for 30 sec - exiting 22:21:55 (868): No heartbeat from core client for 30 sec - exiting 22:21:56 (868): No heartbeat from core client for 30 sec - exiting 22:21:57 (868): No heartbeat from core client for 30 sec - exiting 22:21:58 (868): No heartbeat from core client for 30 sec - exiting 22:21:59 (868): No heartbeat from core client for 30 sec - exiting 22:22:01 (868): No heartbeat from core client for 30 sec - exiting 22:22:02 (868): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/4gziko.pjk1c10 Error converting file to netcdf: dataout/4gziko.pik1c10 Error converting file to netcdf: dataout/4gziko.pfk1c10 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... C14:16:30 (3352): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:25:40 (6096): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:25:41 (6096): No heartbeat from core client for 30 sec - exiting 14:25:42 (6096): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1620, iMonCtr=1 Model crash detected, will try to restart... 22:14:52 (5524): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 22:53:42 (4840): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:34:03 (5892): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4336, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4120, iMonCtr=1 Model crash detected, will try to restart... C13:00:12 (4136): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:55:39 (4240): No heartbeat from core client for 30 sec - exiting 08:55:41 (4240): No heartbeat from core client for 30 sec - exiting 08:55:42 (4240): No heartbeat from core client for 30 sec - exiting 08:55:43 (4240): No heartbeat from core client for 30 sec - exiting 08:55:44 (4240): No heartbeat from core client for 30 sec - exiting 08:55:45 (4240): No heartbeat from core client for 30 sec - exiting 08:55:46 (4240): No heartbeat from core client for 30 sec - exiting 08:55:47 (4240): No heartbeat from core client for 30 sec - exiting 08:55:48 (4240): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4876, iMonCtr=1 Model crash detected, will try to restart... Atmos Hold Restart file rename failed on atmos_restart.hold Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5748, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5304, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4408, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4852, iMonCtr=1 Model crash detected, will try to restart... 08:05:40 (5960): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
23 Dec 2013 06:30:40	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	699,840	1,680,244	2.4009
18 Dec 2013 02:45:07	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	673,920	1,588,329	2.3569
29 Nov 2013 06:13:39	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	648,000	1,503,495	2.3202
25 Nov 2013 06:05:10	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	622,080	1,410,627	2.2676
19 Nov 2013 08:53:46	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	596,160	1,320,898	2.2157
14 Nov 2013 13:29:36	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	570,240	1,283,210	2.2503
28 Sep 2013 01:46:41	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	544,320	1,202,641	2.2094
16 Aug 2013 12:00:17	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	518,400	1,113,699	2.1483
15 Aug 2013 07:25:10	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	492,480	1,072,797	2.1784
14 Aug 2013 16:21:22	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	466,560	1,002,645	2.1490
14 Aug 2013 16:21:22	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	440,640	913,699	2.0736
14 Aug 2013 16:21:22	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	414,720	828,194	1.9970
14 Aug 2013 16:21:22	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	388,800	789,753	2.0313
14 Aug 2013 16:21:22	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	362,880	764,736	2.1074
14 Aug 2013 16:21:22	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	336,960	738,380	2.1913
14 Aug 2013 16:21:22	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	311,040	709,478	2.2810
14 Aug 2013 16:21:22	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	285,120	683,523	2.3973
14 Aug 2013 16:21:22	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	259,200	655,582	2.5293
14 Aug 2013 16:21:22	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	233,280	628,847	2.6957
29 Jul 2013 12:46:00	1218845	15886071	hadcm3n_4gzi_1980_40_008399134_0	207,360	601,408	2.9003