Task 12929490

Name	hadcm3n_o3hb_1940_40_007265630_2
Workunit	7463870
Created	3 Jun 2011, 8:15:27 UTC
Sent	3 Jun 2011, 8:15:33 UTC
Report deadline	2 Sep 2011, 15:42:44 UTC
Received	28 Aug 2011, 10:59:23 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	984314
Run time	13 days 6 hours 57 min 1 sec
CPU time	13 days 6 hours 57 min 1 sec
Validate state	Invalid
Credit	6,220.80
Device peak FLOPS	2.12 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.2.28</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5244, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3516, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3148, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5532, iMonCtr=1 Model crash detected, will try to restart... 10:21:11 (1712): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5628, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1292, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3796, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4740, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5644, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4176, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5428, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2132, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3044, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4576, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1464, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5172, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3640, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4168, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2112, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2112, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3952, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3836, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3500, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3500, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4708, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5316, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4836, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1028, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5992, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3888, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2176, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
28 Jul 2011 20:04:37	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	518,400	1,148,208	2.2149
26 Jul 2011 19:26:26	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	492,480	1,091,306	2.2159
25 Jul 2011 22:18:38	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	466,560	1,033,744	2.2157
25 Jul 2011 18:59:27	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	440,640	977,838	2.2191
25 Jul 2011 17:33:39	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	414,720	921,050	2.2209
25 Jul 2011 16:07:49	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	388,800	863,376	2.2206
25 Jul 2011 13:14:25	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	362,880	806,564	2.2227
10 Jul 2011 17:42:36	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	336,960	744,317	2.2089
09 Jul 2011 12:51:44	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	311,040	684,106	2.1994
05 Jul 2011 15:06:52	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	285,120	624,334	2.1897
01 Jul 2011 13:41:13	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	259,200	565,071	2.1801
29 Jun 2011 18:48:27	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	233,280	505,079	2.1651
26 Jun 2011 13:56:18	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	207,360	446,201	2.1518
25 Jun 2011 07:42:10	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	181,440	386,771	2.1317
23 Jun 2011 17:29:44	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	155,520	324,752	2.0882
20 Jun 2011 19:00:02	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	129,600	267,185	2.0616
20 Jun 2011 11:36:16	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	103,680	212,602	2.0506
19 Jun 2011 22:02:18	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	77,760	158,019	2.0321
16 Jun 2011 19:38:30	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	51,840	104,922	2.0240
14 Jun 2011 17:34:25	984314	12929490	hadcm3n_o3hb_1940_40_007265630_2	25,920	52,422	2.0225