Task 13146990

Name	hadcm3n_yag9_1900_40_007346403_2
Workunit	7543833
Created	19 Jul 2011, 19:35:22 UTC
Sent	19 Jul 2011, 19:39:15 UTC
Report deadline	19 Oct 2011, 3:06:26 UTC
Received	15 Nov 2011, 16:55:50 UTC
Server state	Over
Outcome	Computation error
Client state	Done
Exit status	22 (0x00000016) Unknown error code
Computer ID	857934
Run time	21 days 7 hours 21 min 39 sec
CPU time	21 days 7 hours 21 min 39 sec
Validate state	Invalid
Credit	4,354.56
Device peak FLOPS	1.82 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>5.10.45</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 00:57:30 (5128): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:03:44 (4736): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:03:45 (4736): No heartbeat from core client for 30 sec - exiting 01:03:46 (4736): No heartbeat from core client for 30 sec - exiting 01:03:47 (4736): No heartbeat from core client for 30 sec - exiting 01:03:48 (4736): No heartbeat from core client for 30 sec - exiting 01:03:49 (4736): No heartbeat from core client for 30 sec - exiting 01:03:50 (4736): No heartbeat from core client for 30 sec - exiting 01:03:52 (4736): No heartbeat from core client for 30 sec - exiting 01:03:53 (4736): No heartbeat from core client for 30 sec - exiting 01:03:54 (4736): No heartbeat from core client for 30 sec - exiting 01:03:55 (4736): No heartbeat from core client for 30 sec - exiting 03:40:21 (4212): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 17:14:22 (3572): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:15:36 (5356): No heartbeat from core client for 30 sec - exiting 17:15:38 (5356): No heartbeat from core client for 30 sec - exiting 17:15:39 (5356): No heartbeat from core client for 30 sec - exiting 17:15:40 (5356): No heartbeat from core client for 30 sec - exiting 17:15:41 (5356): No heartbeat from core client for 30 sec - exiting 17:15:42 (5356): No heartbeat from core client for 30 sec - exiting 17:15:43 (5356): No heartbeat from core client for 30 sec - exiting 17:15:44 (5356): No heartbeat from core client for 30 sec - exiting 17:15:45 (5356): No heartbeat from core client for 30 sec - exiting 17:15:46 (5356): No heartbeat from core client for 30 sec - exiting 17:15:47 (5356): No heartbeat from core client for 30 sec - exiting 17:15:48 (5356): No heartbeat from core client for 30 sec - exiting 17:15:50 (5356): No heartbeat from core client for 30 sec - exiting 17:15:51 (5356): No heartbeat from core client for 30 sec - exiting 17:15:52 (5356): No heartbeat from core client for 30 sec - exiting 17:15:53 (5356): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:05:20 (3532): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:05:21 (3532): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6124, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6124, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6124, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6124, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6124, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6124, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 18:26:16 (4136): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:27:05 (1772): No heartbeat from core client for 30 sec - exiting 18:27:06 (1772): No heartbeat from core client for 30 sec - exiting 18:27:07 (1772): No heartbeat from core client for 30 sec - exiting 18:27:08 (1772): No heartbeat from core client for 30 sec - exiting 18:27:09 (1772): No heartbeat from core client for 30 sec - exiting 18:27:10 (1772): No heartbeat from core client for 30 sec - exiting 18:27:11 (1772): No heartbeat from core client for 30 sec - exiting 18:27:12 (1772): No heartbeat from core client for 30 sec - exiting 18:27:14 (1772): No heartbeat from core client for 30 sec - exiting 18:27:15 (1772): No heartbeat from core client for 30 sec - exiting 18:27:16 (1772): No heartbeat from core client for 30 sec - exiting 18:27:17 (1772): No heartbeat from core client for 30 sec - exiting 18:27:18 (1772): No heartbeat from core client for 30 sec - exiting 18:27:19 (1772): No heartbeat from core client for 30 sec - exiting 18:27:20 (1772): No heartbeat from core client for 30 sec - exiting 18:27:21 (1772): No heartbeat from core client for 30 sec - exiting 18:27:22 (1772): No heartbeat from core client for 30 sec - exiting 18:27:23 (1772): No heartbeat from core client for 30 sec - exiting 18:27:24 (1772): No heartbeat from core client for 30 sec - exiting 18:27:26 (1772): No heartbeat from core client for 30 sec - exiting 18:27:27 (1772): No heartbeat from core client for 30 sec - exiting 18:27:28 (1772): No heartbeat from core client for 30 sec - exiting 18:27:29 (1772): No heartbeat from core client for 30 sec - exiting 18:27:30 (1772): No heartbeat from core client for 30 sec - exiting 18:27:31 (1772): No heartbeat from core client for 30 sec - exiting 18:27:32 (1772): No heartbeat from core client for 30 sec - exiting 18:27:33 (1772): No heartbeat from core client for 30 sec - exiting 18:27:34 (1772): No heartbeat from core client for 30 sec - exiting 18:27:35 (1772): No heartbeat from core client for 30 sec - exiting 18:27:36 (1772): No heartbeat from core client for 30 sec - exiting 18:27:38 (1772): No heartbeat from core client for 30 sec - exiting 18:27:39 (1772): No heartbeat from core client for 30 sec - exiting 18:27:40 (1772): No heartbeat from core client for 30 sec - exiting 18:27:41 (1772): No heartbeat from core client for 30 sec - exiting 18:27:42 (1772): No heartbeat from core client for 30 sec - exiting 18:27:43 (1772): No heartbeat from core client for 30 sec - exiting 18:27:44 (1772): No heartbeat from core client for 30 sec - exiting 18:27:45 (1772): No heartbeat from core client for 30 sec - exiting 18:27:46 (1772): No heartbeat from core client for 30 sec - exiting 18:27:47 (1772): No heartbeat from core client for 30 sec - exiting 18:27:49 (1772): No heartbeat from core client for 30 sec - exiting 18:27:50 (1772): No heartbeat from core client for 30 sec - exiting 18:27:51 (1772): No heartbeat from core client for 30 sec - exiting 18:27:52 (1772): No heartbeat from core client for 30 sec - exiting 18:27:53 (1772): No heartbeat from core client for 30 sec - exiting 18:27:54 (1772): No heartbeat from core client for 30 sec - exiting 18:27:55 (1772): No heartbeat from core client for 30 sec - exiting 18:27:56 (1772): No heartbeat from core client for 30 sec - exiting 18:27:57 (1772): No heartbeat from core client for 30 sec - exiting 18:27:58 (1772): No heartbeat from core client for 30 sec - exiting 18:27:59 (1772): No heartbeat from core client for 30 sec - exiting 18:28:01 (1772): No heartbeat from core client for 30 sec - exiting 18:28:02 (1772): No heartbeat from core client for 30 sec - exiting 18:28:03 (1772): No heartbeat from core client for 30 sec - exiting 18:28:04 (1772): No heartbeat from core client for 30 sec - exiting 18:28:05 (1772): No heartbeat from core client for 30 sec - exiting 18:28:06 (1772): No heartbeat from core client for 30 sec - exiting 18:28:07 (1772): No heartbeat from core client for 30 sec - exiting 18:28:08 (1772): No heartbeat from core client for 30 sec - exiting 18:28:09 (1772): No heartbeat from core client for 30 sec - exiting 18:28:10 (1772): No heartbeat from core client for 30 sec - exiting 18:28:12 (1772): No heartbeat from core client for 30 sec - exiting 18:28:13 (1772): No heartbeat from core client for 30 sec - exiting 18:28:14 (1772): No heartbeat from core client for 30 sec - exiting 18:28:15 (1772): No heartbeat from core client for 30 sec - exiting 18:28:16 (1772): No heartbeat from core client for 30 sec - exiting 18:28:17 (1772): No heartbeat from core client for 30 sec - exiting 18:28:18 (1772): No heartbeat from core client for 30 sec - exiting 18:28:19 (1772): No heartbeat from core client for 30 sec - exiting 18:28:20 (1772): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 22:36:11 (2844): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 20:31:47 (3640): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 01:24:29 (4164): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:33:18 (3784): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:47:35 (4300): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Ocean Restart file copy failed on yag9ko.dab42f0 01:26:28 (1988): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:26:56 (1988): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 11:22:35 (2836): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... forrtl: Access is denied. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4388, iMonCtr=1 Model crash detected, will try to restart... forrtl: Access is denied. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4388, iMonCtr=1 Model crash detected, will try to restart... forrtl: Access is denied. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4388, iMonCtr=1 Model crash detected, will try to restart... forrtl: Access is denied. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4388, iMonCtr=1 Model crash detected, will try to restart... forrtl: Access is denied. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4388, iMonCtr=1 Model crash detected, will try to restart... forrtl: Access is denied. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4388, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( No Process Handle Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1212, selfPID=1212, iMonCtr=1 cpdnmonitor: cannot open input file C:\Program Files\BOINC/projects/climateprediction.net/hadcm3n_yag9_1900_40_007346403/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\Program Files\BOINC/projects/climateprediction.net/hadcm3n_yag9_1900_40_007346403/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2852, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\Program Files\BOINC/projects/climateprediction.net/hadcm3n_yag9_1900_40_007346403/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\Program Files\BOINC/projects/climateprediction.net/hadcm3n_yag9_1900_40_007346403/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2852, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\Program Files\BOINC/projects/climateprediction.net/hadcm3n_yag9_1900_40_007346403/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\Program Files\BOINC/projects/climateprediction.net/hadcm3n_yag9_1900_40_007346403/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2852, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\Program Files\BOINC/projects/climateprediction.net/hadcm3n_yag9_1900_40_007346403/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\Program Files\BOINC/projects/climateprediction.net/hadcm3n_yag9_1900_40_007346403/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2852, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\Program Files\BOINC/projects/climateprediction.net/hadcm3n_yag9_1900_40_007346403/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\Program Files\BOINC/projects/climateprediction.net/hadcm3n_yag9_1900_40_007346403/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2852, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\Program Files\BOINC/projects/climateprediction.net/hadcm3n_yag9_1900_40_007346403/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\Program Files\BOINC/projects/climateprediction.net/hadcm3n_yag9_1900_40_007346403/dataout/ocean_restart.day after 11 attempts CPDN Monitor - Quit request from BOINC... cpdnmonitor: cannot open input file C:\Program Files\BOINC/projects/climateprediction.net/hadcm3n_yag9_1900_40_007346403/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\Program Files\BOINC/projects/climateprediction.net/hadcm3n_yag9_1900_40_007346403/dataout/ocean_restart.day after 11 attempts CPDN Monitor - Quit request from BOINC... cpdnmonitor: cannot open input file C:\Program Files\BOINC/projects/climateprediction.net/hadcm3n_yag9_1900_40_007346403/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\Program Files\BOINC/projects/climateprediction.net/hadcm3n_yag9_1900_40_007346403/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1280, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
15 Nov 2011 17:24:48	857934	13146990	hadcm3n_yag9_1900_40_007346403_2	362,880	1,773,105	4.8862
15 Nov 2011 17:24:48	857934	13146990	hadcm3n_yag9_1900_40_007346403_2	336,960	1,697,411	5.0374
15 Nov 2011 17:24:47	857934	13146990	hadcm3n_yag9_1900_40_007346403_2	311,040	1,621,107	5.2119
15 Nov 2011 17:24:47	857934	13146990	hadcm3n_yag9_1900_40_007346403_2	285,120	1,544,739	5.4179
15 Nov 2011 17:24:47	857934	13146990	hadcm3n_yag9_1900_40_007346403_2	259,200	1,468,934	5.6672
16 Sep 2011 03:59:05	857934	13146990	hadcm3n_yag9_1900_40_007346403_2	233,280	670,369	2.8737
12 Sep 2011 17:59:42	857934	13146990	hadcm3n_yag9_1900_40_007346403_2	207,360	595,615	2.8724
07 Sep 2011 07:50:17	857934	13146990	hadcm3n_yag9_1900_40_007346403_2	181,440	523,583	2.8857
06 Sep 2011 10:54:49	857934	13146990	hadcm3n_yag9_1900_40_007346403_2	155,520	451,774	2.9049
05 Sep 2011 13:28:52	857934	13146990	hadcm3n_yag9_1900_40_007346403_2	129,600	377,455	2.9125
29 Aug 2011 21:01:34	857934	13146990	hadcm3n_yag9_1900_40_007346403_2	103,680	303,256	2.9249
04 Aug 2011 00:04:17	857934	13146990	hadcm3n_yag9_1900_40_007346403_2	77,760	229,134	2.9467
28 Jul 2011 18:07:07	857934	13146990	hadcm3n_yag9_1900_40_007346403_2	51,840	150,575	2.9046
25 Jul 2011 23:02:15	857934	13146990	hadcm3n_yag9_1900_40_007346403_2	25,920	74,988	2.8931