Task 16200301

Name	hadcm3n_oba8_1900_40_008469747_1
Workunit	8620586
Created	2 Jan 2014, 13:54:22 UTC
Sent	2 Jan 2014, 13:54:27 UTC
Report deadline	3 Apr 2014, 21:21:38 UTC
Received	17 Feb 2014, 15:10:08 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1376550
Run time	38 days 22 hours 28 min 23 sec
CPU time	36 days 12 hours 20 min 13 sec
Validate state	Invalid
Credit	12,130.56
Device peak FLOPS	1.67 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.2.33</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5612, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3172, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 03:15:31 (6068): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:15:52 (6068): No heartbeat from core client for 30 sec - exiting 03:15:53 (6068): No heartbeat from core client for 30 sec - exiting 03:15:54 (6068): No heartbeat from core client for 30 sec - exiting 03:15:55 (6068): No heartbeat from core client for 30 sec - exiting 03:15:56 (6068): No heartbeat from core client for 30 sec - exiting 03:15:57 (6068): No heartbeat from core client for 30 sec - exiting 03:15:58 (6068): No heartbeat from core client for 30 sec - exiting 03:15:59 (6068): No heartbeat from core client for 30 sec - exiting 03:16:00 (6068): No heartbeat from core client for 30 sec - exiting 03:16:01 (6068): No heartbeat from core client for 30 sec - exiting 19:59:14 (5908): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:59:43 (5908): No heartbeat from core client for 30 sec - exiting 19:59:44 (5908): No heartbeat from core client for 30 sec - exiting 19:59:45 (5908): No heartbeat from core client for 30 sec - exiting 19:59:46 (5908): No heartbeat from core client for 30 sec - exiting 19:59:47 (5908): No heartbeat from core client for 30 sec - exiting 19:59:48 (5908): No heartbeat from core client for 30 sec - exiting 19:59:49 (5908): No heartbeat from core client for 30 sec - exiting 19:59:50 (5908): No heartbeat from core client for 30 sec - exiting 19:59:52 (5908): No heartbeat from core client for 30 sec - exiting 19:59:53 (5908): No heartbeat from core client for 30 sec - exiting Atmos Hold Restart file rename failed on atmos_restart.hold CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 23:31:56 (1456): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 23:50:08 (4504): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:50:09 (4504): No heartbeat from core client for 30 sec - exiting 23:50:10 (4504): No heartbeat from core client for 30 sec - exiting 23:50:11 (4504): No heartbeat from core client for 30 sec - exiting 23:50:12 (4504): No heartbeat from core client for 30 sec - exiting 23:50:13 (4504): No heartbeat from core client for 30 sec - exiting 23:50:14 (4504): No heartbeat from core client for 30 sec - exiting 23:50:15 (4504): No heartbeat from core client for 30 sec - exiting 23:50:16 (4504): No heartbeat from core client for 30 sec - exiting 23:50:17 (4504): No heartbeat from core client for 30 sec - exiting 23:50:18 (4504): No heartbeat from core client for 30 sec - exiting 16:36:21 (1492): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:39:35 (4332): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4604, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CSignal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5904, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5904, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5904, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5904, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5904, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5904, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
16 Feb 2014 14:59:47	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	1,010,880	3,113,891	3.0804
15 Feb 2014 08:16:42	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	984,960	3,024,681	3.0709
14 Feb 2014 03:04:57	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	959,040	2,933,130	3.0584
12 Feb 2014 23:10:56	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	933,120	2,847,457	3.0515
11 Feb 2014 21:54:37	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	907,200	2,760,414	3.0428
10 Feb 2014 03:31:15	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	881,280	2,677,439	3.0381
09 Feb 2014 02:14:24	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	855,360	2,590,397	3.0284
03 Feb 2014 00:44:41	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	829,440	2,498,178	3.0119
01 Feb 2014 19:20:58	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	803,520	2,402,794	2.9903
31 Jan 2014 16:27:31	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	777,600	2,309,663	2.9702
30 Jan 2014 23:00:34	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	751,680	2,213,770	2.9451
30 Jan 2014 23:00:34	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	725,760	2,141,122	2.9502
30 Jan 2014 23:00:34	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	699,840	2,074,737	2.9646
27 Jan 2014 18:38:08	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	673,920	1,999,522	2.9670
26 Jan 2014 22:05:38	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	648,000	1,924,365	2.9697
26 Jan 2014 01:10:39	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	622,080	1,849,617	2.9733
25 Jan 2014 02:21:00	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	596,160	1,774,401	2.9764
24 Jan 2014 11:21:29	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	570,240	1,696,334	2.9748
23 Jan 2014 03:54:21	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	544,320	1,618,219	2.9729
22 Jan 2014 03:12:21	1274848	16200301	hadcm3n_oba8_1900_40_008469747_1	518,400	1,540,506	2.9717