Task 14366128

Name	hadcm3n_yl9o_1980_40_007858318_0
Workunit	8013430
Created	5 Apr 2012, 18:19:38 UTC
Sent	5 Apr 2012, 18:27:35 UTC
Report deadline	6 Jul 2012, 1:54:46 UTC
Received	2 Jul 2012, 22:24:02 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1183189
Run time	16 days 20 hours 46 min 5 sec
CPU time	16 days 4 hours 37 min 11 sec
Validate state	Invalid
Credit	9,331.20
Device peak FLOPS	2.31 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.25</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> 16:27:01 (19328): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 23:52:39 (5236): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 02:18:20 (18480): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 15:51:50 (5780): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 23:50:41 (7884): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:49:32 (15360): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:48:33 (20948): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:26:23 (2504): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:25:22 (10068): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:24:11 (13936): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:23:02 (16976): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 21:21:52 (18264): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 03:20:50 (8680): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:05:55 (5816): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3056, iMonCtr=1 Model crash detected, will try to restart... 13:07:08 (5912): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:06:06 (7608): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:50:44 (6016): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 02:22:56 (5896): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:21:53 (14700): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=23468, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 12:38:25 (844): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:37:18 (8120): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=14264, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 18:37:21 (1152): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:20:05 (1336): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5504, iMonCtr=1 Model crash detected, will try to restart... 02:03:19 (2740): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:02:18 (6748): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4988, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4988, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5188, iMonCtr=1 Model crash detected, will try to restart... 11:44:05 (4288): No heartbeat from core client for 30 sec - exiting 11:44:06 (4288): No heartbeat from core client for 30 sec - exiting 11:44:07 (4288): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1996, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4688, iMonCtr=1 Model crash detected, will try to restart... 18:24:02 (5216): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:07:21 (4588): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:07:22 (4588): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... 15:16:34 (5300): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:15:23 (11008): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5336, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5548, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5628, iMonCtr=1 Model crash detected, will try to restart... 02:53:01 (4960): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9304, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5420, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4536, iMonCtr=1 Model crash detected, will try to restart... 00:42:09 (852): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:41:06 (9200): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=17152, iMonCtr=1 Model crash detected, will try to restart... 18:30:50 (4616): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:25:26 (3340): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Hold Restart file rename failed on atmos_restart.hold Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4904, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
02 Jul 2012 22:26:08	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	777,600	1,399,024	1.7992
02 Jul 2012 19:25:07	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	751,680	1,357,492	1.8059
27 Jun 2012 08:32:47	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	725,760	1,311,112	1.8065
25 Jun 2012 07:43:41	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	699,840	1,260,037	1.8005
23 Jun 2012 22:54:17	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	673,920	1,207,870	1.7923
20 Jun 2012 09:03:55	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	648,000	1,159,547	1.7894
15 Jun 2012 22:09:29	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	622,080	1,112,423	1.7882
04 Jun 2012 23:08:48	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	596,160	1,069,152	1.7934
03 Jun 2012 16:24:41	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	570,240	1,027,930	1.8026
01 Jun 2012 01:17:28	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	544,320	984,247	1.8082
29 May 2012 02:26:14	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	518,400	939,449	1.8122
26 May 2012 00:18:19	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	492,480	893,365	1.8140
17 May 2012 09:41:23	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	466,560	846,402	1.8141
16 May 2012 09:07:41	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	440,640	800,383	1.8164
14 May 2012 00:43:12	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	414,720	752,080	1.8135
12 May 2012 18:38:00	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	388,800	704,328	1.8115
06 May 2012 02:20:10	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	362,880	656,459	1.8090
03 May 2012 00:36:28	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	336,960	609,537	1.8089
02 May 2012 02:48:50	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	311,040	561,271	1.8045
29 Apr 2012 16:27:05	1183189	14366128	hadcm3n_yl9o_1980_40_007858318_0	285,120	514,193	1.8034