Task 13014409

Name	hadcm3n_t1yh_1940_40_007311463_2
Workunit	7508893
Created	28 Jun 2011, 0:18:39 UTC
Sent	28 Jun 2011, 0:18:43 UTC
Report deadline	27 Sep 2011, 7:45:54 UTC
Received	18 Jul 2011, 13:29:33 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1134066
Run time	19 days 1 hours 41 min 42 sec
CPU time	18 days 23 hours 2 min
Validate state	Invalid
Credit	11,508.48
Device peak FLOPS	2.27 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4036, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4360, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 03:08:09 (4852): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:08:24 (4852): No heartbeat from core client for 30 sec - exiting 03:08:25 (4852): No heartbeat from core client for 30 sec - exiting 03:08:26 (4852): No heartbeat from core client for 30 sec - exiting 03:08:27 (4852): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3684, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3684, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3684, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1712, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1712, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1712, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1712, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1712, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1712, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
25 Jul 2011 16:23:33	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	959,040	1,597,173	1.6654
25 Jul 2011 15:58:55	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	933,120	1,553,806	1.6652
25 Jul 2011 15:43:02	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	907,200	1,510,473	1.6650
25 Jul 2011 15:01:16	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	881,280	1,467,008	1.6646
25 Jul 2011 14:26:10	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	855,360	1,423,532	1.6642
25 Jul 2011 13:14:20	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	829,440	1,380,359	1.6642
25 Jul 2011 13:14:20	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	803,520	1,337,636	1.6647
25 Jul 2011 13:14:20	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	777,600	1,295,154	1.6656
25 Jul 2011 13:14:20	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	751,680	1,252,576	1.6664
25 Jul 2011 13:14:19	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	725,760	1,210,101	1.6674
25 Jul 2011 13:14:19	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	699,840	1,166,702	1.6671
25 Jul 2011 13:14:18	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	673,920	1,124,079	1.6680
25 Jul 2011 13:14:18	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	648,000	1,081,856	1.6695
25 Jul 2011 13:14:18	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	622,080	1,039,600	1.6712
10 Jul 2011 18:21:59	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	596,160	997,487	1.6732
10 Jul 2011 05:40:40	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	570,240	955,206	1.6751
09 Jul 2011 17:54:31	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	544,320	912,926	1.6772
09 Jul 2011 06:04:28	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	518,400	870,721	1.6796
08 Jul 2011 17:36:22	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	492,480	826,652	1.6785
08 Jul 2011 05:17:25	1134066	13014409	hadcm3n_t1yh_1940_40_007311463_2	466,560	782,566	1.6773