Task 15427607

Name	hadcm3n_yjv0_1940_40_008239239_4
Workunit	8394363
Created	5 Nov 2012, 20:48:59 UTC
Sent	5 Nov 2012, 20:49:11 UTC
Report deadline	5 Feb 2013, 4:16:22 UTC
Received	23 Jan 2013, 14:33:26 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1236037
Run time	16 days 5 hours 29 min 8 sec
CPU time	15 days 18 hours 32 min 16 sec
Validate state	Invalid
Credit	9,331.20
Device peak FLOPS	2.55 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.42</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6080, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6052, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5952, iMonCtr=1 Model crash detected, will try to restart... Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Hold Restart file rename failed on atmos_restart.hold Ocean Restart file copy failed on yjv0ko.dae8470 Ocean Restart file copy failed on yjv0ko.dae8480 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6124, iMonCtr=1 Model crash detected, will try to restart... 09:43:40 (5780): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5908, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1244, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5764, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6072, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5952, iMonCtr=1 Model crash detected, will try to restart... 21:15:12 (5732): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1316, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4556, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5800, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5688, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5776, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6064, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5744, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5708, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 16:01:04 (5984): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:45:29 (6004): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5644, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5644, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5492, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6024, iMonCtr=1 Model crash detected, will try to restart... Ocean Restart file copy failed on yjv0ko.dag65c0 Ocean Restart file copy failed on yjv0ko.dag65d0 Ocean Restart file copy failed on yjv0ko.dag65e0 Ocean Restart file copy failed on yjv0ko.dag65f0 Ocean Restart file copy failed on yjv0ko.dag65g0 Ocean Restart file copy failed on yjv0ko.dag65h0 Ocean Restart file copy failed on yjv0ko.dag65i0 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5804, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5908, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6028, iMonCtr=1 Model crash detected, will try to restart... 10:02:56 (6072): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:15:26 (7108): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6372, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
22 Jan 2013 14:35:33	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	777,600	1,362,730	1.7525
18 Jan 2013 18:35:14	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	751,680	1,315,758	1.7504
17 Jan 2013 16:45:38	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	725,760	1,268,315	1.7476
15 Jan 2013 20:27:22	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	699,840	1,221,156	1.7449
14 Jan 2013 18:47:37	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	673,920	1,175,412	1.7441
11 Jan 2013 17:16:06	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	648,000	1,129,046	1.7424
04 Jan 2013 22:36:14	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	622,080	1,083,109	1.7411
03 Jan 2013 16:46:09	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	596,160	1,038,003	1.7411
31 Dec 2012 19:51:22	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	570,240	992,830	1.7411
27 Dec 2012 21:46:35	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	544,320	947,934	1.7415
21 Dec 2012 17:57:24	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	518,400	902,588	1.7411
19 Dec 2012 20:21:55	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	492,480	857,293	1.7408
17 Dec 2012 20:23:28	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	466,560	812,009	1.7404
13 Dec 2012 22:07:22	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	440,640	766,808	1.7402
13 Dec 2012 17:46:23	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	414,720	721,537	1.7398
07 Dec 2012 19:51:57	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	388,800	676,207	1.7392
04 Dec 2012 20:34:45	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	362,880	631,245	1.7395
03 Dec 2012 18:35:28	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	336,960	585,984	1.7390
29 Nov 2012 22:14:23	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	311,040	540,644	1.7382
28 Nov 2012 18:56:15	1236037	15427607	hadcm3n_yjv0_1940_40_008239239_4	285,120	495,528	1.7380