Task 16375077

Name	hadcm3n_n0os_1880_40_008403159_4
Workunit	8554015
Created	18 Mar 2014, 1:58:03 UTC
Sent	18 Mar 2014, 1:58:07 UTC
Report deadline	17 Jun 2014, 9:25:18 UTC
Received	10 May 2014, 0:38:33 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1122348
Run time	12 days 11 hours 50 min 20 sec
CPU time	12 days 10 hours 48 min 57 sec
Validate state	Invalid
Credit	5,909.76
Device peak FLOPS	2.33 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6432, iMonCtr=1 Model crash detected, will try to restart... 18:08:19 (2844): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:12:35 (836): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:17:03 (3364): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:21:03 (968): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:24:46 (8100): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:28:51 (4432): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:31:26 (6380): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:35:04 (4516): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:37:02 (2920): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:39:39 (6528): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:42:19 (5944): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:07:42 (6428): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:19:28 (8144): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:21:22 (7428): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:28:20 (5244): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:38:03 (2368): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:42:57 (7844): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:48:00 (6324): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:53:03 (4324): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:58:09 (7372): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:03:20 (8144): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:05:36 (7756): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:08:31 (7348): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:10:40 (2272): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:12:37 (4312): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:14:46 (4516): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:16:26 (6264): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:18:51 (3324): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:20:48 (4372): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:26:15 (4972): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:28:14 (6364): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:31:47 (7300): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:34:22 (4676): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:36:29 (7560): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:38:27 (5012): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:43:48 (6020): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:47:14 (6052): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:49:41 (4916): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:52:21 (6564): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:54:36 (3684): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:56:40 (7060): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:59:28 (5752): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:02:14 (3908): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:05:54 (6752): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:10:40 (1700): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:18:41 (8076): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:31:29 (5944): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:43:30 (6552): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6704, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5308, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6360, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6360, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6360, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6360, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7124, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7124, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7124, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6176, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6176, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6176, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6176, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6176, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6176, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( cpdnmonitor: cannot open input file G:\BOINC Data/projects/climateprediction.net/hadcm3n_n0os_1880_40_008403159/dataout/atmos_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6532, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file G:\BOINC Data/projects/climateprediction.net/hadcm3n_n0os_1880_40_008403159/dataout/atmos_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6532, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file G:\BOINC Data/projects/climateprediction.net/hadcm3n_n0os_1880_40_008403159/dataout/atmos_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6532, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file G:\BOINC Data/projects/climateprediction.net/hadcm3n_n0os_1880_40_008403159/dataout/atmos_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6532, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file G:\BOINC Data/projects/climateprediction.net/hadcm3n_n0os_1880_40_008403159/dataout/atmos_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6532, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file G:\BOINC Data/projects/climateprediction.net/hadcm3n_n0os_1880_40_008403159/dataout/atmos_restart.day after 11 attempts 08:25:37 (6532): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... cpdnmonitor: cannot open input file G:\BOINC Data/projects/climateprediction.net/hadcm3n_n0os_1880_40_008403159/dataout/atmos_restart.day after 11 attempts 19:41:02 (6560): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... cpdnmonitor: cannot open input file G:\BOINC Data/projects/climateprediction.net/hadcm3n_n0os_1880_40_008403159/dataout/atmos_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5224, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
07 May 2014 23:54:51	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	492,480	1,051,513	2.1351
29 Apr 2014 01:19:26	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	466,560	995,009	2.1326
25 Apr 2014 00:54:08	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	440,640	939,775	2.1328
17 Apr 2014 05:48:44	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	414,720	882,858	2.1288
15 Apr 2014 10:29:19	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	388,800	825,432	2.1230
11 Apr 2014 09:53:22	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	362,880	769,120	2.1195
09 Apr 2014 02:33:33	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	336,960	712,807	2.1154
04 Apr 2014 01:33:32	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	311,040	656,835	2.1117
03 Apr 2014 11:16:33	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	285,120	601,758	2.1105
28 Mar 2014 08:02:04	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	259,200	549,168	2.1187
27 Mar 2014 16:10:24	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	233,280	492,041	2.1092
27 Mar 2014 00:33:00	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	207,360	436,084	2.1030
26 Mar 2014 09:25:11	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	181,440	380,978	2.0997
25 Mar 2014 17:52:24	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	155,520	325,590	2.0936
25 Mar 2014 02:35:24	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	129,600	270,649	2.0883
21 Mar 2014 03:24:32	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	103,680	217,278	2.0957
20 Mar 2014 12:31:34	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	77,760	163,049	2.0968
19 Mar 2014 18:48:15	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	51,840	108,838	2.0995
18 Mar 2014 21:15:04	1122348	16375077	hadcm3n_n0os_1880_40_008403159_4	25,920	54,785	2.1136