Task 15850300

Name	hadcm3n_4kqa_1940_40_008309941_3
Workunit	8461076
Created	19 Jun 2013, 17:16:38 UTC
Sent	19 Jun 2013, 18:21:12 UTC
Report deadline	19 Sep 2013, 1:48:23 UTC
Received	3 Dec 2013, 12:28:12 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1254204
Run time	18 days 7 hours 46 min 47 sec
CPU time	16 days 11 hours 57 min 29 sec
Validate state	Invalid
Credit	9,331.20
Device peak FLOPS	2.45 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6012, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Ocean Restart file copy failed on 4kqako.dae6cc0 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5420, iMonCtr=1 Model crash detected, will try to restart... 12:19:30 (4016): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4272, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2976, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5620, iMonCtr=1 Model crash detected, will try to restart.Ocean Restart file copy failed on 4kqako.daf01f0 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Ocean Restart file copy failed on 4kqako.daf16q0 18:55:12 (4068): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1840, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5848, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5932, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4796, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3976, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3132, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5164, iMonCtr=1 Model crash detected, will try to restart... 13:52:21 (5132): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Ocean Restart file copy failed on 4kqako.daf73d0 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1080, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4748, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3328, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3848, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6080, iMonCtr=1 Model crash detected, will try to restart... 00:39:26 (5008): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5476, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5564, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3768, iMonCtr=1 Model crash detected, will try to restart... Atmos Hold Restart file rename failed on atmos_restart.hold CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4436, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5704, iMonCtr=1 Model crash detected, will try to restart... 14:03:02 (5664): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5076, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3428, iMonCtr=1 Model crash detected, will try to restart... Ocean Restart file copy failed on 4kqako.dag23c0 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4188, iMonCtr=1 Model crash detected, will try to restart... Atmos Hold Restart file rename failed on atmos_restart.hold Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2688, iMonCtr=1 Model crash detected, will try to restart... C13:24:12 (1348): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5760, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5588, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2024, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4228, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4228, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4228, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5564, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6140, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5696, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5580, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5012, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 15:24:40 (5920): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Ocean Restart file copy failed on 4kqako.dag92g0 Ocean Restart file copy failed on 4kqako.dag98i0 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2664, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5960, iMonCtr=1 Model crash detected, will try to restart... 02:22:34 (3668): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
03 Dec 2013 11:28:55	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	777,600	1,425,189	1.8328
27 Nov 2013 13:29:42	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	751,680	1,382,015	1.8386
22 Nov 2013 11:51:21	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	725,760	1,339,269	1.8453
20 Nov 2013 11:17:39	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	699,840	1,294,960	1.8504
11 Nov 2013 14:23:59	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	673,920	1,249,128	1.8535
07 Nov 2013 07:20:24	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	648,000	1,202,310	1.8554
05 Nov 2013 16:10:45	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	622,080	1,153,001	1.8535
03 Nov 2013 11:13:36	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	596,160	1,103,962	1.8518
31 Oct 2013 12:40:03	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	570,240	1,056,135	1.8521
24 Oct 2013 09:34:39	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	544,320	1,006,326	1.8488
18 Oct 2013 08:45:21	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	518,400	958,291	1.8486
10 Oct 2013 17:47:32	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	492,480	909,573	1.8469
02 Oct 2013 17:22:56	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	466,560	861,386	1.8462
01 Oct 2013 10:56:54	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	440,640	812,624	1.8442
11 Sep 2013 15:52:55	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	414,720	765,281	1.8453
09 Sep 2013 13:10:50	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	388,800	721,100	1.8547
29 Aug 2013 10:44:58	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	362,880	674,237	1.8580
21 Aug 2013 14:43:07	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	336,960	625,412	1.8560
16 Aug 2013 12:35:24	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	311,040	574,121	1.8458
16 Aug 2013 08:34:35	1254204	15850300	hadcm3n_4kqa_1940_40_008309941_3	285,120	526,516	1.8466