Task 13127126

Name	hadcm3n_ym75_1900_40_007361627_1
Workunit	7559057
Created	6 Jul 2011, 15:20:27 UTC
Sent	7 Jul 2011, 13:01:28 UTC
Report deadline	6 Oct 2011, 20:28:39 UTC
Received	28 Jul 2011, 22:52:03 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1294024
Run time	20 days 3 hours 35 min 58 sec
CPU time	10 days 5 hours 42 min 16 sec
Validate state	Invalid
Credit	8,709.12
Device peak FLOPS	2.41 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 i686-pc-linux-gnu
Stderr	<core_client_version>6.10.17</core_client_version> <![CDATA[ <message> process exited with code 22 (0x16, -234) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2704, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2704, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2704, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2704, iMonCtr=1 Model crash detected, will try to restart... 18:29:41 (2704): No heartbeat from core client for 30 sec - exiting 18:29:44 (2704): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:32:04 (18790): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19461, iMonCtr=1 Model crash detected, will try to restart... 18:34:26 (19461): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19659, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19659, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19659, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19659, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19659, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19659, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=19659, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
28 Jul 2011 22:55:45	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	725,760	1,648,283	2.2711
28 Jul 2011 22:55:39	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	699,840	1,592,756	2.2759
28 Jul 2011 22:55:39	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	673,920	1,533,224	2.2751
28 Jul 2011 22:55:41	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	648,000	1,473,163	2.2734
28 Jul 2011 22:55:46	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	622,080	1,413,643	2.2724
28 Jul 2011 22:55:43	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	596,160	1,353,513	2.2704
28 Jul 2011 22:55:42	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	570,240	1,293,603	2.2685
28 Jul 2011 22:55:41	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	544,320	1,233,702	2.2665
28 Jul 2011 22:55:41	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	518,400	1,173,800	2.2643
28 Jul 2011 22:55:44	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	492,480	1,113,888	2.2618
25 Jul 2011 18:49:59	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	466,560	1,053,776	2.2586
25 Jul 2011 18:04:01	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	440,640	993,696	2.2551
25 Jul 2011 17:36:51	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	414,720	933,547	2.2510
25 Jul 2011 16:41:44	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	388,800	873,481	2.2466
25 Jul 2011 16:05:03	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	362,880	814,264	2.2439
25 Jul 2011 15:39:08	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	336,960	754,456	2.2390
25 Jul 2011 14:47:35	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	311,040	695,611	2.2364
25 Jul 2011 13:34:40	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	285,120	637,720	2.2367
25 Jul 2011 13:34:40	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	259,200	581,934	2.2451
25 Jul 2011 13:34:41	982003	13127126	hadcm3n_ym75_1900_40_007361627_1	233,280	522,902	2.2415