Task 15584912

Name	hadcm3n_o6el_2140_40_008270186_3
Workunit	8425310
Created	5 Feb 2013, 18:41:58 UTC
Sent	5 Feb 2013, 18:42:04 UTC
Report deadline	8 May 2013, 2:09:15 UTC
Received	23 Apr 2013, 21:33:41 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	25 (0x00000019) Unknown error code
Computer ID	1019939
Run time	12 days 19 hours 44 min 47 sec
CPU time	11 days 11 hours 8 min 11 sec
Validate state	Invalid
Credit	5,287.68
Device peak FLOPS	2.89 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.28</core_client_version> <![CDATA[ <message> Das Laufwerk kann einen bestimmten Bereich oder eine bestimmte Spur nicht finden. (0x19) - exit code 25 (0x19) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4612, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 00:46:25 (4700): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:46:27 (4700): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4356, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4572, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4704, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4852, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/o6elko.pjy4c10 Error converting file to netcdf: dataout/o6elko.piy4c10 Error converting file to netcdf: dataout/o6elko.pfy4c10 Error converting file to netcdf: dataout/o6elka.phy4c10 Error converting file to netcdf: dataout/o6elka.pgy4c10 Error converting file to netcdf: dataout/o6elka.pey4c10 Error converting file to netcdf: dataout/o6elka.pdy4c10 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3908, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4672, iMonCtr=1 Model crash detected, will try to restart... CSuspended CPDN Monitor - Suspend request from BOINC... 00:46:45 (1576): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:46:46 (1576): No heartbeat from core client for 30 sec - exiting 18:07:48 (4660): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:25:33 (4624): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4360, iMonCtr=1 Model crash detected, will try to restart... 19:13:38 (4276): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:14:24 (5732): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=976, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5060, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4176, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4648, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4284, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4424, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4032, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4648, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3452, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4040, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4112, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4224, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4800, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4476, iMonCtr=1 Model crash detected, will try to restart... 10:47:45 (1680): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... C19:42:36 (2856): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:43:50 (4808): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 19:54:37 (4384): No heartbeat from core client for 30 sec - exiting 19:54:38 (4384): No heartbeat from core client for 30 sec - exiting 19:54:39 (4384): No heartbeat from core client for 30 sec - exiting 19:54:40 (4384): No heartbeat from core client for 30 sec - exiting 19:54:41 (4384): No heartbeat from core client for 30 sec - exiting 19:54:42 (4384): No heartbeat from core client for 30 sec - exiting 19:54:44 (4384): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:54:45 (4384): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 20:18:26 (4312): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:33:53 (4596): No heartbeat from core client for 30 sec - exiting 07:33:54 (4596): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:33:55 (4596): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4180, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4300, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4444, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4840, iMonCtr=1 Model crash detected, will try to restart... 19:11:56 (5052): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
21 Apr 2013 18:04:33	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	440,640	980,866	2.2260
09 Apr 2013 18:37:40	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	414,720	906,469	2.1857
06 Apr 2013 12:40:03	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	388,800	849,433	2.1848
01 Apr 2013 14:35:38	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	362,880	785,125	2.1636
28 Mar 2013 21:08:22	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	336,960	730,835	2.1689
21 Mar 2013 21:39:43	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	311,040	667,153	2.1449
12 Mar 2013 19:07:08	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	285,120	581,318	2.0389
05 Mar 2013 19:40:25	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	259,200	509,516	1.9657
02 Mar 2013 18:58:38	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	233,280	451,001	1.9333
28 Feb 2013 20:07:26	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	207,360	393,110	1.8958
24 Feb 2013 08:40:17	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	181,440	342,417	1.8872
23 Feb 2013 22:15:51	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	155,520	305,433	1.9639
21 Feb 2013 21:54:19	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	129,600	258,568	1.9951
16 Feb 2013 18:01:38	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	103,680	198,365	1.9132
11 Feb 2013 22:11:52	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	77,760	130,606	1.6796
09 Feb 2013 19:14:54	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	51,840	86,107	1.6610
07 Feb 2013 21:26:43	1019939	15584912	hadcm3n_o6el_2140_40_008270186_3	25,920	42,819	1.6520