Task 16036228

Name	hadcm3n_o82e_1900_40_008465577_0
Workunit	8616416
Created	27 Sep 2013, 9:16:39 UTC
Sent	9 Oct 2013, 19:26:18 UTC
Report deadline	9 Jan 2014, 2:53:29 UTC
Received	15 Oct 2013, 17:03:05 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1263617
Run time	3 days 12 hours 10 min 38 sec
CPU time	3 days 4 hours 18 min 39 sec
Validate state	Invalid
Credit	3,421.44
Device peak FLOPS	3.21 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.64</core_client_version> <![CDATA[ <message> Das Gerät erkennt den Befehl nicht. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8724, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8724, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8724, iMonCtr=1 Model crash detected, will try to restart... 15:35:27 (8280): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:12:04 (8900): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:13:53 (20580): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:17:30 (22040): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold 21:20:13 (21868): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:20:14 (21868): No heartbeat from core client for 30 sec - exiting Atmos Hold Restart file rename failed on atmos_restart.hold 21:47:06 (22032): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:47:07 (22032): No heartbeat from core client for 30 sec - exiting 22:21:10 (23236): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:21:11 (23236): No heartbeat from core client for 30 sec - exiting 22:21:12 (23236): No heartbeat from core client for 30 sec - exiting 22:21:13 (23236): No heartbeat from core client for 30 sec - exiting 22:21:14 (23236): No heartbeat from core client for 30 sec - exiting 22:21:15 (23236): No heartbeat from core client for 30 sec - exiting 22:21:16 (23236): No heartbeat from core client for 30 sec - exiting 22:21:19 (23236): No heartbeat from core client for 30 sec - exiting 22:21:20 (23236): No heartbeat from core client for 30 sec - exiting 22:25:11 (25008): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:25:12 (25008): No heartbeat from core client for 30 sec - exiting 22:25:13 (25008): No heartbeat from core client for 30 sec - exiting 22:25:14 (25008): No heartbeat from core client for 30 sec - exiting 22:25:15 (25008): No heartbeat from core client for 30 sec - exiting 22:25:16 (25008): No heartbeat from core client for 30 sec - exiting 22:25:19 (25008): No heartbeat from core client for 30 sec - exiting 22:25:20 (25008): No heartbeat from core client for 30 sec - exiting 22:40:10 (25220): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:47:54 (24776): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:53:20 (25460): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:53:21 (25460): No heartbeat from core client for 30 sec - exiting 22:53:22 (25460): No heartbeat from core client for 30 sec - exiting 22:53:24 (25460): No heartbeat from core client for 30 sec - exiting 22:53:27 (25460): No heartbeat from core client for 30 sec - exiting 22:53:28 (25460): No heartbeat from core client for 30 sec - exiting 22:53:29 (25460): No heartbeat from core client for 30 sec - exiting 22:53:30 (25460): No heartbeat from core client for 30 sec - exiting 22:58:06 (25712): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:58:07 (25712): No heartbeat from core client for 30 sec - exiting 22:58:08 (25712): No heartbeat from core client for 30 sec - exiting 22:58:09 (25712): No heartbeat from core client for 30 sec - exiting 22:58:11 (25712): No heartbeat from core client for 30 sec - exiting 19:04:50 (7712): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8180, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8180, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8180, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8180, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8180, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8180, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
14 Oct 2013 19:27:48	1263617	16036228	hadcm3n_o82e_1900_40_008465577_0	285,120	265,535	0.9313
14 Oct 2013 01:58:51	1263617	16036228	hadcm3n_o82e_1900_40_008465577_0	259,200	241,766	0.9327
13 Oct 2013 18:39:33	1263617	16036228	hadcm3n_o82e_1900_40_008465577_0	233,280	216,904	0.9298
13 Oct 2013 12:01:11	1263617	16036228	hadcm3n_o82e_1900_40_008465577_0	207,360	193,487	0.9331
13 Oct 2013 05:03:00	1263617	16036228	hadcm3n_o82e_1900_40_008465577_0	181,440	169,350	0.9334
12 Oct 2013 22:14:43	1263617	16036228	hadcm3n_o82e_1900_40_008465577_0	155,520	145,701	0.9369
12 Oct 2013 15:19:17	1263617	16036228	hadcm3n_o82e_1900_40_008465577_0	129,600	121,921	0.9407
12 Oct 2013 08:09:31	1263617	16036228	hadcm3n_o82e_1900_40_008465577_0	103,680	98,152	0.9467
12 Oct 2013 01:21:22	1263617	16036228	hadcm3n_o82e_1900_40_008465577_0	77,760	74,216	0.9544
11 Oct 2013 16:35:18	1263617	16036228	hadcm3n_o82e_1900_40_008465577_0	51,840	48,726	0.9399
10 Oct 2013 03:33:17	1263617	16036228	hadcm3n_o82e_1900_40_008465577_0	25,920	24,946	0.9624