Task 15880811

Name	hadcm3n_n4uy_1960_40_008394221_1
Workunit	8545080
Created	4 Jul 2013, 14:37:29 UTC
Sent	4 Jul 2013, 15:23:56 UTC
Report deadline	3 Oct 2013, 22:51:07 UTC
Received	9 Oct 2013, 13:41:32 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1273466
Run time	14 days 2 hours 13 min 52 sec
CPU time	11 days 19 hours 42 min 54 sec
Validate state	Invalid
Credit	6,220.80
Device peak FLOPS	2.35 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.64</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6664, iMonCtr=1 Model crash detected, will try to restart... 20:03:50 (5504): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:13:17 (6972): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:13:18 (6972): No heartbeat from core client for 30 sec - exiting 21:13:19 (6972): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=10244, iMonCtr=1 Model crash detected, will try to restart... 09:22:36 (6980): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6500, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7660, iMonCtr=1 Model crash detected, will try to restart... 07:56:28 (5144): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8040, iMonCtr=1 Model crash detected, will try to restart... 11:54:32 (5940): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:04:58 (7068): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6692, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6692, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6692, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6560, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3952, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 17:12:02 (2668): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:18:18 (12616): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:57:43 (14128): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:57:44 (14128): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=13724, iMonCtr=1 Model crash detected, will try to restart... 10:27:59 (6436): No heartbeat from core client for 30 sec - exiting 10:28:00 (6436): No heartbeat from core client for 30 sec - exiting 10:28:01 (6436): No heartbeat from core client for 30 sec - exiting 10:28:02 (6436): No heartbeat from core client for 30 sec - exiting 10:28:03 (6436): No heartbeat from core client for 30 sec - exiting 10:28:04 (6436): No heartbeat from core client for 30 sec - exiting 10:28:05 (6436): No heartbeat from core client for 30 sec - exiting 10:28:06 (6436): No heartbeat from core client for 30 sec - exiting 10:28:07 (6436): No heartbeat from core client for 30 sec - exiting 10:28:08 (6436): No heartbeat from core client for 30 sec - exiting 10:28:09 (6436): No heartbeat from core client for 30 sec - exiting 10:28:10 (6436): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3980, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 08:54:13 (4804): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5040, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5040, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5040, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5040, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5040, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5040, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
08 Oct 2013 21:37:20	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	518,400	999,471	1.9280
07 Oct 2013 14:11:02	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	492,480	947,638	1.9242
06 Oct 2013 17:59:38	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	466,560	898,521	1.9258
06 Oct 2013 05:43:01	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	440,640	855,204	1.9408
05 Oct 2013 17:11:19	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	414,720	811,322	1.9563
05 Oct 2013 03:01:36	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	388,800	761,948	1.9597
04 Oct 2013 13:56:26	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	362,880	716,650	1.9749
03 Oct 2013 22:31:03	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	336,960	667,754	1.9817
02 Oct 2013 13:54:47	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	311,040	615,045	1.9774
01 Oct 2013 04:38:51	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	285,120	563,486	1.9763
30 Sep 2013 10:25:58	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	259,200	512,439	1.9770
29 Sep 2013 16:31:42	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	233,280	459,203	1.9685
28 Sep 2013 23:25:49	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	207,360	406,855	1.9621
28 Sep 2013 06:28:42	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	181,440	354,822	1.9556
10 Jul 2013 15:07:12	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	155,520	304,487	1.9579
09 Jul 2013 00:13:17	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	129,600	252,194	1.9459
07 Jul 2013 21:24:23	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	103,680	202,576	1.9539
06 Jul 2013 17:20:02	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	77,760	153,415	1.9729
06 Jul 2013 05:35:25	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	51,840	101,512	1.9582
06 Jul 2013 04:44:44	1273466	15880811	hadcm3n_n4uy_1960_40_008394221_1	25,920	50,672	1.9549