Task 16295525

Name	hadcm3n_86rl_1980_40_008515272_0
Workunit	8662784
Created	26 Feb 2014, 16:10:04 UTC
Sent	26 Feb 2014, 16:58:23 UTC
Report deadline	29 May 2014, 0:25:34 UTC
Received	17 Apr 2014, 18:44:17 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1294309
Run time	19 days 9 hours 15 min 6 sec
CPU time	17 days 22 hours 11 min
Validate state	Invalid
Credit	12,130.56
Device peak FLOPS	3.04 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.2.42</core_client_version> <![CDATA[ <message> Das Gerät erkennt den Befehl nicht. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4936, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4936, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4936, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4936, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4936, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3544, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3544, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3756, iMonCtr=1 Model crash detected, will try to restart... 07:34:37 (3088): No heartbeat from core client for 30 sec - exiting 07:34:38 (3088): No heartbeat from core client for 30 sec - exiting 07:34:39 (3088): No heartbeat from core client for 30 sec - exiting 07:34:40 (3088): No heartbeat from core client for 30 sec - exiting 07:34:41 (3088): No heartbeat from core client for 30 sec - exiting 07:34:42 (3088): No heartbeat from core client for 30 sec - exiting 07:34:43 (3088): No heartbeat from core client for 30 sec - exiting 07:34:44 (3088): No heartbeat from core client for 30 sec - exiting 07:34:45 (3088): No heartbeat from core client for 30 sec - exiting 07:34:46 (3088): No heartbeat from core client for 30 sec - exiting 07:34:47 (3088): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:34:34 (3200): No heartbeat from core client for 30 sec - exiting 07:34:35 (3200): No heartbeat from core client for 30 sec - exiting 07:34:36 (3200): No heartbeat from core client for 30 sec - exiting 07:34:37 (3200): No heartbeat from core client for 30 sec - exiting 07:34:38 (3200): No heartbeat from core client for 30 sec - exiting 07:34:39 (3200): No heartbeat from core client for 30 sec - exiting 07:34:40 (3200): No heartbeat from core client for 30 sec - exiting 07:34:41 (3200): No heartbeat from core client for 30 sec - exiting 07:34:42 (3200): No heartbeat from core client for 30 sec - exiting 07:34:43 (3200): No heartbeat from core client for 30 sec - exiting 07:34:44 (3200): No heartbeat from core client for 30 sec - exiting 07:34:45 (3200): No heartbeat from core client for 30 sec - exiting 07:34:46 (3200): No heartbeat from core client for 30 sec - exiting 07:34:47 (3200): No heartbeat from core client for 30 sec - exiting 07:34:48 (3200): No heartbeat from core client for 30 sec - exiting 07:34:49 (3200): No heartbeat from core client for 30 sec - exiting 07:34:50 (3200): No heartbeat from core client for 30 sec - exiting 07:34:51 (3200): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2396, iMonCtr=1 Model crash detected, will try to restart... 09:10:29 (5464): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4488, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4488, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5200, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3884, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3884, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3804, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3804, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3676, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3676, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3676, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3676, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3676, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4672, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3972, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3972, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6464, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6464, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4308, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4308, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7444, iMonCtr=1 Model crash detected, will try to restart... 08:00:23 (7380): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6824, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6824, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6824, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6824, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6824, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 18:15:42 (5400): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4932, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4932, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3544, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3544, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3544, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3544, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3544, iMonCtr=1 Model crash detected, will try to restart... 16:18:50 (1512): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4992, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4992, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2632, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3492, iMonCtr=1 Model crash detected, will try to restart... 17:45:07 (5800): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:43:17 (6812): No heartbeat from core client for 30 sec - exiting 17:43:18 (6812): No heartbeat from core client for 30 sec - exiting 17:43:19 (6812): No heartbeat from core client for 30 sec - exiting 17:43:20 (6812): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 20:57:25 (4276): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 16:34:18 (5088): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1168, iMonCtr=1 Model crash detected, will try to restart... 20:17:57 (1456): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:23:48 (4896): No heartbeat from core client for 30 sec - exiting 14:23:50 (4896): No heartbeat from core client for 30 sec - exiting 14:23:51 (4896): No heartbeat from core client for 30 sec - exiting 14:23:52 (4896): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:27:28 (7720): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 16:34:02 (4912): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:34:08 (4652): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2656, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2656, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2656, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2656, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2656, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2656, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
16 Apr 2014 17:00:15	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	1,010,880	1,527,950	1.5115
12 Apr 2014 09:00:42	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	984,960	1,484,652	1.5073
09 Apr 2014 16:31:24	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	959,040	1,443,619	1.5053
04 Apr 2014 20:02:25	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	933,120	1,405,192	1.5059
03 Apr 2014 16:30:47	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	907,200	1,367,701	1.5076
02 Apr 2014 15:59:45	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	881,280	1,330,644	1.5099
30 Mar 2014 19:43:14	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	855,360	1,292,425	1.5110
29 Mar 2014 13:01:42	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	829,440	1,254,396	1.5123
28 Mar 2014 14:33:01	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	803,520	1,215,785	1.5131
27 Mar 2014 12:09:45	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	777,600	1,178,264	1.5153
26 Mar 2014 13:50:54	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	751,680	1,140,450	1.5172
25 Mar 2014 15:32:05	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	725,760	1,099,911	1.5155
24 Mar 2014 13:33:43	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	699,840	1,058,361	1.5123
23 Mar 2014 13:29:35	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	673,920	1,019,828	1.5133
22 Mar 2014 15:36:10	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	648,000	981,975	1.5154
21 Mar 2014 15:21:45	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	622,080	944,268	1.5179
20 Mar 2014 18:07:39	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	596,160	904,657	1.5175
19 Mar 2014 16:42:48	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	570,240	861,700	1.5111
18 Mar 2014 19:09:47	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	544,320	820,716	1.5078
17 Mar 2014 16:46:38	1294309	16295525	hadcm3n_86rl_1980_40_008515272_0	518,400	777,069	1.4990