Task 13394141

Name	hadcm3n_t37a_1940_40_007446061_3
Workunit	7643564
Created	17 Sep 2011, 14:51:13 UTC
Sent	17 Sep 2011, 15:32:48 UTC
Report deadline	17 Dec 2011, 22:59:59 UTC
Received	9 Oct 2011, 5:19:45 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	977091
Run time	20 days 18 hours 45 min 16 sec
CPU time	20 days 12 hours 41 min 32 sec
Validate state	Invalid
Credit	10,264.32
Device peak FLOPS	2.52 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.26</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 15:12:17 (2736): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:46:07 (6244): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:47:52 (3740): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6176, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6176, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 13:52:08 (5348): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 17:14:33 (7056): No heartbeat from core client for 30 sec - exiting 17:14:34 (7056): No heartbeat from core client for 30 sec - exiting 17:14:35 (7056): No heartbeat from core client for 30 sec - exiting 17:14:36 (7056): No heartbeat from core client for 30 sec - exiting 17:14:37 (7056): No heartbeat from core client for 30 sec - exiting 17:14:38 (7056): No heartbeat from core client for 30 sec - exiting 17:14:39 (7056): No heartbeat from core client for 30 sec - exiting 17:14:40 (7056): No heartbeat from core client for 30 sec - exiting 17:14:41 (7056): No heartbeat from core client for 30 sec - exiting 17:14:43 (7056): No heartbeat from core client for 30 sec - exiting 17:14:44 (7056): No heartbeat from core client for 30 sec - exiting 17:14:45 (7056): No heartbeat from core client for 30 sec - exiting 17:14:46 (7056): No heartbeat from core client for 30 sec - exiting 17:14:47 (7056): No heartbeat from core client for 30 sec - exiting 17:14:48 (7056): No heartbeat from core client for 30 sec - exiting 17:14:49 (7056): No heartbeat from core client for 30 sec - exiting 17:14:50 (7056): No heartbeat from core client for 30 sec - exiting 17:14:51 (7056): No heartbeat from core client for 30 sec - exiting 17:14:52 (7056): No heartbeat from core client for 30 sec - exiting 17:14:53 (7056): No heartbeat from core client for 30 sec - exiting 17:14:55 (7056): No heartbeat from core client for 30 sec - exiting 17:14:56 (7056): No heartbeat from core client for 30 sec - exiting 17:14:57 (7056): No heartbeat from core client for 30 sec - exiting 17:14:58 (7056): No heartbeat from core client for 30 sec - exiting 17:14:59 (7056): No heartbeat from core client for 30 sec - exiting 17:15:00 (7056): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 14:10:50 (5264): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 16:16:39 (4772): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5724, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5724, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5724, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5724, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5724, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5724, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
08 Oct 2011 10:41:14	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	855,360	1,733,077	2.0261
07 Oct 2011 21:11:37	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	829,440	1,683,090	2.0292
07 Oct 2011 05:57:35	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	803,520	1,631,375	2.0303
06 Oct 2011 15:46:53	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	777,600	1,578,763	2.0303
06 Oct 2011 00:25:11	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	751,680	1,525,999	2.0301
05 Oct 2011 09:25:40	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	725,760	1,473,149	2.0298
04 Oct 2011 17:58:32	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	699,840	1,420,291	2.0295
04 Oct 2011 02:20:32	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	673,920	1,367,351	2.0290
03 Oct 2011 11:30:09	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	648,000	1,314,347	2.0283
02 Oct 2011 20:35:53	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	622,080	1,261,432	2.0278
02 Oct 2011 05:48:22	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	596,160	1,208,378	2.0269
01 Oct 2011 14:54:02	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	570,240	1,155,203	2.0258
01 Oct 2011 00:05:01	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	544,320	1,102,294	2.0251
30 Sep 2011 09:11:38	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	518,400	1,049,362	2.0242
29 Sep 2011 18:27:16	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	492,480	996,681	2.0238
29 Sep 2011 03:43:56	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	466,560	943,863	2.0230
28 Sep 2011 12:53:09	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	440,640	891,063	2.0222
27 Sep 2011 22:11:24	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	414,720	838,659	2.0222
27 Sep 2011 07:27:14	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	388,800	786,105	2.0219
26 Sep 2011 16:35:03	977091	13394141	hadcm3n_t37a_1940_40_007446061_3	362,880	733,276	2.0207