Task 13555808

Name	hadcm3n_yhg7_1900_40_007525036_1
Workunit	7722511
Created	28 Oct 2011, 13:37:08 UTC
Sent	30 Oct 2011, 11:20:54 UTC
Report deadline	29 Jan 2012, 18:48:05 UTC
Received	5 Dec 2011, 20:38:53 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	859264
Run time	10 days 6 hours 2 min 12 sec
CPU time	8 days 17 hours 45 min 53 sec
Validate state	Invalid
Credit	6,220.80
Device peak FLOPS	2.72 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5176, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5176, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5176, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5176, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4900, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 11:06:00 (5320): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:06:14 (5320): No heartbeat from core client for 30 sec - exiting 11:06:15 (5320): No heartbeat from core client for 30 sec - exiting 11:06:16 (5320): No heartbeat from core client for 30 sec - exiting 11:06:17 (5320): No heartbeat from core client for 30 sec - exiting 11:06:18 (5320): No heartbeat from core client for 30 sec - exiting 11:06:19 (5320): No heartbeat from core client for 30 sec - exiting 11:06:20 (5320): No heartbeat from core client for 30 sec - exiting 11:06:21 (5320): No heartbeat from core client for 30 sec - exiting 11:06:22 (5320): No heartbeat from core client for 30 sec - exiting 11:06:23 (5320): No heartbeat from core client for 30 sec - exiting 11:06:24 (5320): No heartbeat from core client for 30 sec - exiting 11:06:25 (5320): No heartbeat from core client for 30 sec - exiting 11:06:26 (5320): No heartbeat from core client for 30 sec - exiting 11:06:27 (5320): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3820, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5316, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4108, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5348, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5348, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4188, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5268, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5312, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5312, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5088, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5088, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5088, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4768, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5924, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5924, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5020, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4764, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4764, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4764, iMonCtr=1 Model crash detected, will try to restart... </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
05 Dec 2011 19:39:44	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	518,400	755,136	1.4567
04 Dec 2011 16:11:45	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	492,480	718,419	1.4588
03 Dec 2011 15:22:56	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	466,560	680,182	1.4579
02 Dec 2011 17:19:21	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	440,640	642,362	1.4578
28 Nov 2011 14:32:08	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	414,720	604,230	1.4570
22 Nov 2011 12:24:50	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	388,800	566,096	1.4560
21 Nov 2011 14:50:06	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	362,880	527,200	1.4528
16 Nov 2011 22:05:28	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	336,960	489,003	1.4512
15 Nov 2011 17:40:52	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	311,040	451,420	1.4513
15 Nov 2011 17:40:51	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	285,120	413,500	1.4503
15 Nov 2011 17:40:51	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	259,200	376,038	1.4508
15 Nov 2011 17:40:51	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	233,280	338,130	1.4495
09 Nov 2011 22:43:30	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	207,360	300,124	1.4474
09 Nov 2011 10:21:43	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	181,440	262,699	1.4479
08 Nov 2011 12:28:19	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	155,520	225,225	1.4482
07 Nov 2011 14:33:09	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	129,600	188,001	1.4506
03 Nov 2011 23:24:52	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	103,680	150,615	1.4527
03 Nov 2011 11:46:06	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	77,760	113,614	1.4611
01 Nov 2011 14:36:04	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	51,840	76,306	1.4720
31 Oct 2011 20:01:46	859264	13555808	hadcm3n_yhg7_1900_40_007525036_1	25,920	38,618	1.4899