Task 13023207

Name	hadcm3n_t3g9_1940_40_007315160_0
Workunit	7512590
Created	28 Jun 2011, 19:57:22 UTC
Sent	28 Jun 2011, 20:01:06 UTC
Report deadline	28 Sep 2011, 3:28:17 UTC
Received	29 Aug 2011, 20:16:57 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1115875
Run time	12 days 3 hours 46 min 52 sec
CPU time	10 days 19 hours 10 min 59 sec
Validate state	Invalid
Credit	6,220.80
Device peak FLOPS	2.44 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5196, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4124, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3288, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6728, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4296, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4320, iMonCtr=1 Model crash detected, will try to restart... 05:48:24 (5912): No heartbeat from core client for 30 sec - exiting 05:48:25 (5912): No heartbeat from core client for 30 sec - exiting 05:48:26 (5912): No heartbeat from core client for 30 sec - exiting 05:48:27 (5912): No heartbeat from core client for 30 sec - exiting 05:48:28 (5912): No heartbeat from core client for 30 sec - exiting 05:48:29 (5912): No heartbeat from core client for 30 sec - exiting 05:48:30 (5912): No heartbeat from core client for 30 sec - exiting 05:48:31 (5912): No heartbeat from core client for 30 sec - exiting 05:48:32 (5912): No heartbeat from core client for 30 sec - exiting 05:48:33 (5912): No heartbeat from core client for 30 sec - exiting 05:48:34 (5912): No heartbeat from core client for 30 sec - exiting 05:48:35 (5912): No heartbeat from core client for 30 sec - exiting 05:48:36 (5912): No heartbeat from core client for 30 sec - exiting 05:48:37 (5912): No heartbeat from core client for 30 sec - exiting 05:48:38 (5912): No heartbeat from core client for 30 sec - exiting 05:48:39 (5912): No heartbeat from core client for 30 sec - exiting 05:48:40 (5912): No heartbeat from core client for 30 sec - exiting 05:48:41 (5912): No heartbeat from core client for 30 sec - exiting 05:48:42 (5912): No heartbeat from core client for 30 sec - exiting 05:48:43 (5912): No heartbeat from core client for 30 sec - exiting 05:48:44 (5912): No heartbeat from core client for 30 sec - exiting 05:48:45 (5912): No heartbeat from core client for 30 sec - exiting 05:48:46 (5912): No heartbeat from core client for 30 sec - exiting 05:48:47 (5912): No heartbeat from core client for 30 sec - exiting 05:48:48 (5912): No heartbeat from core client for 30 sec - exiting 05:48:49 (5912): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2504, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7116, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6032, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5492, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5172, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5164, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4148, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4548, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3728, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3364, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5292, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4588, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4300, iMonCtr=1 Model crash detected, will try to restart... 05:56:31 (5724): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4500, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2008, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5108, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1884, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5852, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5144, iMonCtr=1 Model crash detected, will try to restart... CSignal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
29 Aug 2011 20:15:33	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	518,400	933,053	1.7999
28 Aug 2011 09:47:51	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	492,480	887,331	1.8018
26 Aug 2011 17:33:31	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	466,560	841,887	1.8045
24 Aug 2011 04:52:44	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	440,640	796,477	1.8075
22 Aug 2011 10:17:50	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	414,720	751,085	1.8111
21 Aug 2011 03:48:59	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	388,800	705,890	1.8156
03 Aug 2011 19:11:28	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	362,880	660,022	1.8188
31 Jul 2011 20:37:28	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	336,960	614,946	1.8250
31 Jul 2011 05:33:01	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	311,040	569,775	1.8318
29 Jul 2011 17:20:58	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	285,120	523,987	1.8378
26 Jul 2011 18:49:49	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	259,200	479,058	1.8482
25 Jul 2011 21:46:33	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	233,280	433,931	1.8601
25 Jul 2011 20:30:42	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	207,360	385,507	1.8591
25 Jul 2011 18:56:20	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	181,440	337,049	1.8576
25 Jul 2011 16:00:36	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	155,520	288,873	1.8575
25 Jul 2011 14:57:04	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	129,600	240,791	1.8580
10 Jul 2011 19:23:03	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	103,680	193,330	1.8647
08 Jul 2011 03:31:49	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	77,760	145,654	1.8731
05 Jul 2011 18:55:20	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	51,840	96,676	1.8649
03 Jul 2011 04:14:30	1115875	13023207	hadcm3n_t3g9_1940_40_007315160_0	25,920	48,468	1.8699