Task 15586325

Name	hadcm3n_z7yy_1920_40_008281785_1
Workunit	8432920
Created	6 Feb 2013, 15:11:18 UTC
Sent	6 Feb 2013, 15:11:45 UTC
Report deadline	8 May 2013, 22:38:56 UTC
Received	15 Jun 2013, 8:25:08 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	-226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS
Computer ID	1204261
Run time	81 days 14 hours 23 min 57 sec
CPU time	71 days 17 hours 0 min 13 sec
Validate state	Invalid
Credit	9,331.20
Device peak FLOPS	1.08 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.28</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> 03:55:33 (5852): No heartbeat from core client for 30 sec - exiting 03:55:34 (5852): No heartbeat from core client for 30 sec - exiting 03:55:35 (5852): No heartbeat from core client for 30 sec - exiting 03:55:36 (5852): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 13:21:37 (5568): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:34:36 (5948): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3920, iMonCtr=1 Model crash detected, will try to restart... 11:01:03 (5176): No heartbeat from core client for 30 sec - exiting 11:01:04 (5176): No heartbeat from core client for 30 sec - exiting 11:01:05 (5176): No heartbeat from core client for 30 sec - exiting 11:01:06 (5176): No heartbeat from core client for 30 sec - exiting 11:01:07 (5176): No heartbeat from core client for 30 sec - exiting 11:01:08 (5176): No heartbeat from core client for 30 sec - exiting 11:01:09 (5176): No heartbeat from core client for 30 sec - exiting 11:01:10 (5176): No heartbeat from core client for 30 sec - exiting 11:01:11 (5176): No heartbeat from core client for 30 sec - exiting 11:01:12 (5176): No heartbeat from core client for 30 sec - exiting 11:01:13 (5176): No heartbeat from core client for 30 sec - exiting 11:01:15 (5176): No heartbeat from core client for 30 sec - exiting 11:01:16 (5176): No heartbeat from core client for 30 sec - exiting 11:01:17 (5176): No heartbeat from core client for 30 sec - exiting 11:01:18 (5176): No heartbeat from core client for 30 sec - exiting 11:01:19 (5176): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 03:45:49 (3052): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:32:40 (4700): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 10:41:39 (6876): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:25:25 (11748): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8156, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 11:19:31 (5088): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5012, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5012, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 03:55:21 (6764): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:53:51 (2644): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5688, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5688, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5688, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5688, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish 12:37:28 (3944): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 07:35:39 (3108): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:36:40 (6080): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:10:14 (4596): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... 02:19:25 (4328): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:20:21 (6068): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:21:18 (2844): No heartbeat from core client for 30 sec - exiting 02:21:20 (2844): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:21:22 (2844): No heartbeat from core client for 30 sec - exiting 02:21:23 (2844): No heartbeat from core client for 30 sec - exiting 02:21:24 (2844): No heartbeat from core client for 30 sec - exiting 02:21:25 (2844): No heartbeat from core client for 30 sec - exiting 02:21:26 (2844): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 03:39:39 (6360): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 03:31:48 (6084): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:52:36 (1428): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:53:51 (9996): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
24 Apr 2013 07:59:53	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	777,600	5,851,083	7.5245
19 Apr 2013 14:55:36	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	751,680	5,510,108	7.3304
15 Apr 2013 07:55:00	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	725,760	5,195,218	7.1583
11 Apr 2013 17:59:16	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	699,840	4,930,131	7.0447
08 Apr 2013 09:42:27	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	673,920	4,675,988	6.9385
05 Apr 2013 07:08:52	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	648,000	4,431,780	6.8392
02 Apr 2013 07:12:59	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	622,080	4,204,110	6.7582
30 Mar 2013 07:13:54	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	596,160	3,978,581	6.6737
27 Mar 2013 15:05:33	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	570,240	3,769,657	6.6106
24 Mar 2013 21:08:10	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	544,320	3,557,604	6.5359
22 Mar 2013 10:39:30	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	518,400	3,363,468	6.4882
20 Mar 2013 10:47:29	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	492,480	3,192,824	6.4832
18 Mar 2013 09:23:52	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	466,560	3,022,909	6.4791
15 Mar 2013 21:38:37	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	440,640	2,829,739	6.4219
13 Mar 2013 07:39:36	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	414,720	2,631,781	6.3459
10 Mar 2013 19:51:55	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	388,800	2,450,234	6.3020
08 Mar 2013 13:50:17	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	362,880	2,271,168	6.2587
06 Mar 2013 06:50:21	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	336,960	2,101,059	6.2353
03 Mar 2013 21:59:56	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	311,040	1,927,128	6.1958
01 Mar 2013 10:32:53	1204261	15586325	hadcm3n_z7yy_1920_40_008281785_1	285,120	1,749,233	6.1351