Task 13538800

Name	hadcm3n_ygvd_1900_40_007516562_0
Workunit	7714037
Created	28 Oct 2011, 12:47:52 UTC
Sent	22 Nov 2011, 20:34:34 UTC
Report deadline	22 Feb 2012, 4:01:45 UTC
Received	13 Jan 2012, 14:45:38 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1080897
Run time	14 days 1 hours 39 min 45 sec
CPU time	14 days 1 hours 39 min 45 sec
Validate state	Invalid
Credit	6,531.84
Device peak FLOPS	2.36 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.56</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 08:20:10 (2952): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:15:40 (3204): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=400, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3244, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3624, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3020, iMonCtr=1 Model crash detected, will try to restart... 08:16:36 (2884): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:17:38 (2884): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2800, iMonCtr=1 Model crash detected, will try to restart... 19:19:13 (2004): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2736, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2484, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2608, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1164, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1164, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2728, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2728, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2728, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2728, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
11 Jan 2012 22:15:00	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	544,320	1,177,790	2.1638
08 Jan 2012 03:38:52	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	518,400	1,124,146	2.1685
06 Jan 2012 23:45:16	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	492,480	1,067,489	2.1676
05 Jan 2012 18:37:36	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	466,560	1,009,906	2.1646
04 Jan 2012 02:34:40	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	440,640	953,998	2.1650
02 Jan 2012 00:17:49	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	414,720	895,384	2.1590
31 Dec 2011 21:19:34	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	388,800	838,143	2.1557
30 Dec 2011 16:38:21	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	362,880	782,006	2.1550
25 Dec 2011 16:20:28	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	336,960	725,145	2.1520
24 Dec 2011 02:58:53	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	311,040	669,495	2.1524
23 Dec 2011 04:05:04	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	285,120	614,534	2.1554
21 Dec 2011 00:25:43	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	259,200	558,306	2.1540
18 Dec 2011 20:19:08	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	233,280	501,582	2.1501
17 Dec 2011 17:05:46	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	207,360	444,813	2.1451
13 Dec 2011 23:35:16	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	181,440	388,854	2.1432
09 Dec 2011 00:05:32	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	155,520	333,265	2.1429
06 Dec 2011 01:02:13	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	129,600	277,313	2.1398
04 Dec 2011 15:31:36	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	103,680	222,138	2.1425
01 Dec 2011 01:38:14	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	77,760	165,992	2.1347
27 Nov 2011 01:18:29	1080897	13538800	hadcm3n_ygvd_1900_40_007516562_0	51,840	109,803	2.1181