Task 13338124

Name	hadcm3n_o6rg_1900_40_007440641_0
Workunit	7638144
Created	5 Sep 2011, 18:20:59 UTC
Sent	5 Sep 2011, 18:53:25 UTC
Report deadline	6 Dec 2011, 2:20:36 UTC
Received	25 Oct 2011, 15:38:26 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1115875
Run time	12 days 14 hours 0 min 49 sec
CPU time	11 days 1 hours 29 min 19 sec
Validate state	Invalid
Credit	6,220.80
Device peak FLOPS	2.44 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 06:45:08 (4476): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5340, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4732, iMonCtr=1 Model crash detected, will try to restart... 05:35:10 (4836): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6100, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3736, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4676, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4608, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4812, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4880, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4832, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=124, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3172, iMonCtr=1 Model crash detected, will try to restart... 07:48:18 (1448): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5808, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6016, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3936, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1608, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5684, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6044, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5904, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4752, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5424, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1032, iMonCtr=1 Model crash detected, will try to restart... C04:46:00 (5616): No heartbeat from core client for 30 sec - exiting 04:46:02 (5616): No heartbeat from core client for 30 sec - exiting 04:46:03 (5616): No heartbeat from core client for 30 sec - exiting 04:46:04 (5616): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 04:46:05 (5616): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6560, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6468, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5488, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1388, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5264, iMonCtr=1 Model crash detected, will try to restart... 06:52:50 (4332): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
31 Oct 2011 13:45:00	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	518,400	955,752	1.8437
31 Oct 2011 13:44:59	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	492,480	911,362	1.8506
17 Oct 2011 20:16:35	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	466,560	863,396	1.8506
15 Oct 2011 07:00:37	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	440,640	814,948	1.8495
11 Oct 2011 13:55:24	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	414,720	766,410	1.8480
09 Oct 2011 08:41:40	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	388,800	717,729	1.8460
07 Oct 2011 21:37:07	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	362,880	668,178	1.8413
05 Oct 2011 06:07:24	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	336,960	617,000	1.8311
02 Oct 2011 14:20:45	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	311,040	565,961	1.8196
01 Oct 2011 05:50:50	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	285,120	513,792	1.8020
28 Sep 2011 16:32:06	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	259,200	461,633	1.7810
26 Sep 2011 06:48:34	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	233,280	412,323	1.7675
24 Sep 2011 17:36:24	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	207,360	359,832	1.7353
22 Sep 2011 05:25:38	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	181,440	313,647	1.7287
20 Sep 2011 06:21:22	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	155,520	269,284	1.7315
18 Sep 2011 13:51:47	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	129,600	224,694	1.7338
17 Sep 2011 06:24:09	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	103,680	179,421	1.7305
14 Sep 2011 05:16:35	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	77,760	134,353	1.7278
13 Sep 2011 02:20:45	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	51,840	89,309	1.7228
11 Sep 2011 08:15:55	1115875	13338124	hadcm3n_o6rg_1900_40_007440641_0	25,920	44,542	1.7184