Task 13344629

Name	hadcm3n_o086_1900_40_007438514_2
Workunit	7636017
Created	7 Sep 2011, 8:31:31 UTC
Sent	7 Sep 2011, 8:35:49 UTC
Report deadline	7 Dec 2011, 16:03:00 UTC
Received	13 Oct 2011, 9:49:57 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1067203
Run time	7 days 2 hours 35 min 45 sec
CPU time	6 days 14 hours 21 min 54 sec
Validate state	Invalid
Credit	3,110.40
Device peak FLOPS	2.22 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.34</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4356, iMonCtr=1 Model crash detected, will try to restart... 07:08:53 (3184): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4684, iMonCtr=1 Model crash detected, will try to restart... 07:09:38 (5208): No heartbeat from core client for 30 sec - exiting 07:09:39 (5208): No heartbeat from core client for 30 sec - exiting 07:09:40 (5208): No heartbeat from core client for 30 sec - exiting 07:09:41 (5208): No heartbeat from core client for 30 sec - exiting 07:09:42 (5208): No heartbeat from core client for 30 sec - exiting 07:09:43 (5208): No heartbeat from core client for 30 sec - exiting 07:09:44 (5208): No heartbeat from core client for 30 sec - exiting 07:09:45 (5208): No heartbeat from core client for 30 sec - exiting 07:09:46 (5208): No heartbeat from core client for 30 sec - exiting 07:09:47 (5208): No heartbeat from core client for 30 sec - exiting 07:09:48 (5208): No heartbeat from core client for 30 sec - exiting 07:09:49 (5208): No heartbeat from core client for 30 sec - exiting 07:09:50 (5208): No heartbeat from core client for 30 sec - exiting 07:09:51 (5208): No heartbeat from core client for 30 sec - exiting 07:09:52 (5208): No heartbeat from core client for 30 sec - exiting 07:09:53 (5208): No heartbeat from core client for 30 sec - exiting 07:09:54 (5208): No heartbeat from core client for 30 sec - exiting 07:09:55 (5208): No heartbeat from core client for 30 sec - exiting 07:09:56 (5208): No heartbeat from core client for 30 sec - exiting 07:09:57 (5208): No heartbeat from core client for 30 sec - exiting 07:09:58 (5208): No heartbeat from core client for 30 sec - exiting 07:09:59 (5208): No heartbeat from core client for 30 sec - exiting 07:10:00 (5208): No heartbeat from core client for 30 sec - exiting 07:10:01 (5208): No heartbeat from core client for 30 sec - exiting 07:10:02 (5208): No heartbeat from core client for 30 sec - exiting 07:10:03 (5208): No heartbeat from core client for 30 sec - exiting 07:10:04 (5208): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:15:07 (5676): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=732, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4396, iMonCtr=1 Model crash detected, will try to restart... 08:52:35 (5100): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2360, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 09:09:58 (788): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5880, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4332, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5696, iMonCtr=1 Model crash detected, will try to restart... 14:49:50 (4952): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:05:00 (1260): No heartbeat from core client for 30 sec - exiting 11:05:01 (1260): No heartbeat from core client for 30 sec - exiting 11:05:02 (1260): No heartbeat from core client for 30 sec - exiting 11:05:03 (1260): No heartbeat from core client for 30 sec - exiting 11:05:04 (1260): No heartbeat from core client for 30 sec - exiting 11:05:05 (1260): No heartbeat from core client for 30 sec - exiting 11:05:06 (1260): No heartbeat from core client for 30 sec - exiting 11:05:07 (1260): No heartbeat from core client for 30 sec - exiting 11:05:08 (1260): No heartbeat from core client for 30 sec - exiting 11:05:09 (1260): No heartbeat from core client for 30 sec - exiting 11:05:10 (1260): No heartbeat from core client for 30 sec - exiting 11:05:11 (1260): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:51:44 (2652): No heartbeat from core client for 30 sec - exiting 08:51:45 (2652): No heartbeat from core client for 30 sec - exiting 08:51:46 (2652): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4304, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
13 Oct 2011 09:51:02	1067203	13344629	hadcm3n_o086_1900_40_007438514_2	259,200	570,111	2.1995
11 Oct 2011 10:44:50	1067203	13344629	hadcm3n_o086_1900_40_007438514_2	233,280	520,595	2.2316
07 Oct 2011 08:35:37	1067203	13344629	hadcm3n_o086_1900_40_007438514_2	207,360	471,106	2.2719
05 Oct 2011 06:53:05	1067203	13344629	hadcm3n_o086_1900_40_007438514_2	181,440	417,789	2.3026
04 Oct 2011 15:12:47	1067203	13344629	hadcm3n_o086_1900_40_007438514_2	155,520	363,523	2.3375
27 Sep 2011 10:44:26	1067203	13344629	hadcm3n_o086_1900_40_007438514_2	129,600	310,020	2.3921
21 Sep 2011 10:42:48	1067203	13344629	hadcm3n_o086_1900_40_007438514_2	103,680	244,605	2.3592
16 Sep 2011 11:12:10	1067203	13344629	hadcm3n_o086_1900_40_007438514_2	77,760	183,137	2.3552
13 Sep 2011 12:06:47	1067203	13344629	hadcm3n_o086_1900_40_007438514_2	51,840	123,579	2.3839
09 Sep 2011 08:00:40	1067203	13344629	hadcm3n_o086_1900_40_007438514_2	25,920	63,390	2.4456