Task 13138910

Name	hadcm3n_yhe1_1900_40_007355395_2
Workunit	7552825
Created	15 Jul 2011, 15:24:33 UTC
Sent	15 Jul 2011, 15:25:37 UTC
Report deadline	14 Oct 2011, 22:52:48 UTC
Received	28 Jul 2011, 22:52:03 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1294024
Run time	12 days 1 hours 31 min 51 sec
CPU time	11 days 13 hours 27 min 18 sec
Validate state	Invalid
Credit	4,976.64
Device peak FLOPS	2.43 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 i686-pc-linux-gnu
Stderr	<core_client_version>6.10.17</core_client_version> <![CDATA[ <message> process exited with code 22 (0x16, -234) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 18:29:42 (2709): No heartbeat from core client for 30 sec - exiting 18:29:44 (2709): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:29:45 (2709): No heartbeat from core client for 30 sec - exiting 18:32:04 (18814): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19535, iMonCtr=1 Model crash detected, will try to restart... 18:34:24 (19535): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19667, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19667, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19667, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19667, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19667, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19667, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19667, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19667, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19667, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19667, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19667, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
28 Jul 2011 22:55:39	982003	13138910	hadcm3n_yhe1_1900_40_007355395_2	414,720	972,035	2.3438
28 Jul 2011 22:55:41	982003	13138910	hadcm3n_yhe1_1900_40_007355395_2	388,800	914,051	2.3510
28 Jul 2011 22:55:43	982003	13138910	hadcm3n_yhe1_1900_40_007355395_2	362,880	852,867	2.3503
28 Jul 2011 22:55:41	982003	13138910	hadcm3n_yhe1_1900_40_007355395_2	336,960	791,634	2.3493
28 Jul 2011 22:55:40	982003	13138910	hadcm3n_yhe1_1900_40_007355395_2	311,040	730,645	2.3490
28 Jul 2011 22:55:41	982003	13138910	hadcm3n_yhe1_1900_40_007355395_2	285,120	669,544	2.3483
28 Jul 2011 22:55:43	982003	13138910	hadcm3n_yhe1_1900_40_007355395_2	259,200	608,457	2.3474
28 Jul 2011 22:55:45	982003	13138910	hadcm3n_yhe1_1900_40_007355395_2	233,280	547,200	2.3457
28 Jul 2011 22:55:41	982003	13138910	hadcm3n_yhe1_1900_40_007355395_2	207,360	486,085	2.3442
28 Jul 2011 22:55:39	982003	13138910	hadcm3n_yhe1_1900_40_007355395_2	181,440	425,248	2.3437
25 Jul 2011 18:21:11	982003	13138910	hadcm3n_yhe1_1900_40_007355395_2	155,520	364,511	2.3438
25 Jul 2011 17:55:23	982003	13138910	hadcm3n_yhe1_1900_40_007355395_2	129,600	303,394	2.3410
25 Jul 2011 17:26:57	982003	13138910	hadcm3n_yhe1_1900_40_007355395_2	103,680	242,435	2.3383
25 Jul 2011 16:29:44	982003	13138910	hadcm3n_yhe1_1900_40_007355395_2	77,760	181,111	2.3291
25 Jul 2011 15:54:56	982003	13138910	hadcm3n_yhe1_1900_40_007355395_2	51,840	120,511	2.3247
25 Jul 2011 15:29:03	982003	13138910	hadcm3n_yhe1_1900_40_007355395_2	25,920	60,444	2.3319