Task 15434886

Name	hadcm3n_zai1_1920_40_008244441_0
Workunit	8399565
Created	14 Nov 2012, 18:22:05 UTC
Sent	14 Nov 2012, 18:22:13 UTC
Report deadline	14 Feb 2013, 1:49:24 UTC
Received	9 Jan 2013, 13:51:06 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1206904
Run time	20 days 2 hours 45 min 20 sec
CPU time	15 days 13 hours 21 min 20 sec
Validate state	Invalid
Credit	6,220.80
Device peak FLOPS	2.39 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.25</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> 15:08:05 (4768): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6004, iMonCtr=1 Model crash detected, will try to restart... 15:40:34 (3432): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5464, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4664, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4956, iMonCtr=1 Model crash detected, will try to restart... 11:12:20 (6736): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:13:29 (2128): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... C15:03:01 (2144): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:03:03 (2144): No heartbeat from core client for 30 sec - exiting 15:04:34 (1840): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:04:35 (1840): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4484, iMonCtr=1 Model crash detected, will try to restart... 15:00:43 (6344): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... C11:04:23 (5076): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5428, iMonCtr=1 Model crash detected, will try to restart... 12:10:31 (4892): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3396, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5968, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4164, iMonCtr=1 Model crash detected, will try to restart... 16:08:22 (5844): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... C09:19:48 (4888): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6076, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1796, iMonCtr=1 Model crash detected, will try to restart... 15:35:43 (4532): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2156, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4208, iMonCtr=1 Model crash detected, will try to restart... 09:02:44 (6240): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... C10:30:02 (6008): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7264, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5640, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4228, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4240, iMonCtr=1 Model crash detected, will try to restart... 08:56:48 (4384): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5216, iMonCtr=1 Model crash detected, will try to restart... 08:44:21 (6344): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 14:19:55 (4328): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6344, iMonCtr=1 Model crash detected, will try to restart... 09:23:42 (4700): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6468, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5964, iMonCtr=1 Model crash detected, will try to restart... 15:54:52 (4868): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2680, iMonCtr=1 Model crash detected, will try to restart... 16:51:54 (6244): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... C17:43:57 (4820): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6780, iMonCtr=1 Model crash detected, will try to restart... 19:35:56 (4204): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6896, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5656, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3872, iMonCtr=1 Model crash detected, will try to restart... 14:51:00 (4160): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6832, iMonCtr=1 Model crash detected, will try to restart... 09:15:44 (6620): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1580, iMonCtr=1 Model crash detected, will try to restart... 14:49:50 (5492): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4912, iMonCtr=1 Model crash detected, will try to restart... 10:15:16 (5868): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:29:39 (2452): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:29:40 (2452): No heartbeat from core client for 30 sec - exiting 11:33:16 (4920): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6512, iMonCtr=1 Model crash detected, will try to restart... 11:06:37 (5732): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:08:00 (756): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5540, iMonCtr=1 Model crash detected, will try to restart... 12:01:26 (3364): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
09 Jan 2013 12:55:09	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	518,400	1,344,065	2.5927
06 Jan 2013 20:26:53	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	492,480	1,278,031	2.5951
03 Jan 2013 19:34:57	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	466,560	1,216,764	2.6079
01 Jan 2013 20:45:45	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	440,640	1,165,794	2.6457
31 Dec 2012 12:30:11	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	414,720	1,109,710	2.6758
22 Dec 2012 17:41:26	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	388,800	1,041,312	2.6783
17 Dec 2012 19:47:44	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	362,880	972,879	2.6810
15 Dec 2012 13:35:23	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	336,960	904,662	2.6848
14 Dec 2012 16:16:06	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	311,040	834,593	2.6832
14 Dec 2012 16:16:06	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	285,120	764,975	2.6830
14 Dec 2012 16:16:06	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	259,200	696,176	2.6859
06 Dec 2012 12:49:16	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	233,280	626,399	2.6852
03 Dec 2012 14:25:38	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	207,360	556,486	2.6837
01 Dec 2012 15:57:51	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	181,440	485,157	2.6739
29 Nov 2012 17:52:20	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	155,520	414,901	2.6678
28 Nov 2012 07:11:21	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	129,600	345,826	2.6684
26 Nov 2012 13:36:48	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	103,680	277,279	2.6744
23 Nov 2012 20:18:26	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	77,760	208,321	2.6790
20 Nov 2012 20:31:25	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	51,840	138,642	2.6744
17 Nov 2012 21:53:08	1206904	15434886	hadcm3n_zai1_1920_40_008244441_0	25,920	68,620	2.6474