Task 15524709

Name	hadcm3n_3dqh_1940_40_008267352_1
Workunit	8422476
Created	6 Jan 2013, 23:35:55 UTC
Sent	6 Jan 2013, 23:36:02 UTC
Report deadline	8 Apr 2013, 7:03:13 UTC
Received	27 Jan 2013, 13:39:24 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1229213
Run time	5 days 15 hours 26 min 14 sec
CPU time	5 days 6 hours 54 min 18 sec
Validate state	Invalid
Credit	9,020.16
Device peak FLOPS	4.48 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.28</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5116, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4872, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4728, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4728, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4728, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4424, iMonCtr=1 Model crash detected, will try to restart... CSuspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1220, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4272, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4512, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4512, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4860, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4600, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4600, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4600, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4600, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4600, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4600, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4556, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1996, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1996, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1996, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1996, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1996, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
24 Jan 2013 21:17:24	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	751,680	429,679	0.5716
23 Jan 2013 22:24:05	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	725,760	414,585	0.5712
22 Jan 2013 23:41:29	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	699,840	399,476	0.5708
22 Jan 2013 19:38:30	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	673,920	384,318	0.5703
21 Jan 2013 20:37:09	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	648,000	368,841	0.5692
20 Jan 2013 22:59:00	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	622,080	353,474	0.5682
20 Jan 2013 17:53:09	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	596,160	338,636	0.5680
20 Jan 2013 01:13:11	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	570,240	323,369	0.5671
19 Jan 2013 20:46:17	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	544,320	307,948	0.5657
19 Jan 2013 17:16:09	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	518,400	293,181	0.5655
18 Jan 2013 22:47:44	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	492,480	277,949	0.5644
18 Jan 2013 18:54:25	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	466,560	263,120	0.5640
17 Jan 2013 20:37:00	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	440,640	247,775	0.5623
16 Jan 2013 21:16:19	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	414,720	232,720	0.5611
15 Jan 2013 21:44:33	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	388,800	217,903	0.5605
15 Jan 2013 17:41:31	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	362,880	203,041	0.5595
14 Jan 2013 19:43:11	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	336,960	188,458	0.5593
13 Jan 2013 22:04:44	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	311,040	174,317	0.5604
13 Jan 2013 17:15:49	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	285,120	160,566	0.5632
13 Jan 2013 13:55:13	1229213	15524709	hadcm3n_3dqh_1940_40_008267352_1	259,200	146,435	0.5649