Task 15896579

Name	hadcm3n_3i2y_1980_40_008400578_1
Workunit	8551434
Created	18 Jul 2013, 20:06:23 UTC
Sent	18 Jul 2013, 20:15:59 UTC
Report deadline	18 Oct 2013, 3:43:10 UTC
Received	23 Aug 2013, 13:46:43 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1251442
Run time	16 days 16 hours 42 min 43 sec
CPU time	12 days 1 hours 1 min 38 sec
Validate state	Invalid
Credit	4,665.60
Device peak FLOPS	2.28 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.64</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7032, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4852, iMonCtr=1 Model crash detected, will try to restart... 10:59:21 (5180): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6952, iMonCtr=1 Model crash detected, will try to restart... 12:44:31 (5200): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7556, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4596, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6836, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6308, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6092, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6092, iMonCtr=1 Model crash detected, will try to restart... 12:02:03 (2948): No heartbeat from core client for 30 sec - exiting 12:02:05 (2948): No heartbeat from core client for 30 sec - exiting 12:02:06 (2948): No heartbeat from core client for 30 sec - exiting 12:02:07 (2948): No heartbeat from core client for 30 sec - exiting 12:02:08 (2948): No heartbeat from core client for 30 sec - exiting 12:02:09 (2948): No heartbeat from core client for 30 sec - exiting 12:02:10 (2948): No heartbeat from core client for 30 sec - exiting 12:02:11 (2948): No heartbeat from core client for 30 sec - exiting 12:02:12 (2948): No heartbeat from core client for 30 sec - exiting 12:02:13 (2948): No heartbeat from core client for 30 sec - exiting 12:02:14 (2948): No heartbeat from core client for 30 sec - exiting 12:02:16 (2948): No heartbeat from core client for 30 sec - exiting 12:02:17 (2948): No heartbeat from core client for 30 sec - exiting 12:02:18 (2948): No heartbeat from core client for 30 sec - exiting 12:02:19 (2948): No heartbeat from core client for 30 sec - exiting 12:02:20 (2948): No heartbeat from core client for 30 sec - exiting 12:02:21 (2948): No heartbeat from core client for 30 sec - exiting 12:02:22 (2948): No heartbeat from core client for 30 sec - exiting 12:02:23 (2948): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5768, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 11:12:25 (1668): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1452, iMonCtr=1 Model crash detected, will try to restart... 10:24:50 (4620): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:24:51 (4620): No heartbeat from core client for 30 sec - exiting 10:24:52 (4620): No heartbeat from core client for 30 sec - exiting 10:24:53 (4620): No heartbeat from core client for 30 sec - exiting 10:24:54 (4620): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6004, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6004, iMonCtr=1 Model crash detected, will try to restart... 13:32:20 (6064): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2192, iMonCtr=1 Model crash detected, will try to restart... 15:38:11 (5476): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7784, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4604, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5564, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5564, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8068, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4692, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6536, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7124, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5000, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5000, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5000, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2936, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6156, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6156, iMonCtr=1 Model crash detected, will try to restart... 11:54:36 (6176): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5984, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5984, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5984, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5984, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5984, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5984, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
21 Aug 2013 16:38:30	1251442	15896579	hadcm3n_3i2y_1980_40_008400578_1	388,800	1,007,367	2.5910
19 Aug 2013 17:11:33	1251442	15896579	hadcm3n_3i2y_1980_40_008400578_1	362,880	939,609	2.5893
16 Aug 2013 20:42:05	1251442	15896579	hadcm3n_3i2y_1980_40_008400578_1	336,960	874,784	2.5961
14 Aug 2013 17:49:28	1251442	15896579	hadcm3n_3i2y_1980_40_008400578_1	311,040	805,430	2.5895
14 Aug 2013 17:49:28	1251442	15896579	hadcm3n_3i2y_1980_40_008400578_1	285,120	737,148	2.5854
14 Aug 2013 17:49:28	1251442	15896579	hadcm3n_3i2y_1980_40_008400578_1	259,200	665,222	2.5664
14 Aug 2013 17:49:28	1251442	15896579	hadcm3n_3i2y_1980_40_008400578_1	233,280	594,523	2.5485
14 Aug 2013 17:49:28	1251442	15896579	hadcm3n_3i2y_1980_40_008400578_1	207,360	526,614	2.5396
14 Aug 2013 17:49:28	1251442	15896579	hadcm3n_3i2y_1980_40_008400578_1	181,440	459,405	2.5320
14 Aug 2013 17:49:28	1251442	15896579	hadcm3n_3i2y_1980_40_008400578_1	155,520	393,303	2.5290
29 Jul 2013 14:34:52	1251442	15896579	hadcm3n_3i2y_1980_40_008400578_1	129,600	330,867	2.5530
29 Jul 2013 14:34:52	1251442	15896579	hadcm3n_3i2y_1980_40_008400578_1	103,680	266,808	2.5734
25 Jul 2013 02:06:57	1251442	15896579	hadcm3n_3i2y_1980_40_008400578_1	77,760	201,289	2.5886
23 Jul 2013 22:13:56	1251442	15896579	hadcm3n_3i2y_1980_40_008400578_1	51,840	135,558	2.6149
23 Jul 2013 20:38:38	1251442	15896579	hadcm3n_3i2y_1980_40_008400578_1	25,920	67,677	2.6110