Task 14651134

Name	hadcm3n_o2q8_2060_40_007958254_1
Workunit	8113366
Created	9 May 2012, 18:24:25 UTC
Sent	9 May 2012, 18:27:50 UTC
Report deadline	9 Aug 2012, 1:55:01 UTC
Received	17 Jul 2012, 21:02:24 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1047718
Run time	7 days 6 hours 37 min 5 sec
CPU time	5 days 21 hours 44 min 8 sec
Validate state	Invalid
Credit	4,354.56
Device peak FLOPS	3.05 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.28</core_client_version> <![CDATA[ <message> Il dispositivo non riconosce il comando. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6532, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5184, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6172, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6120, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1796, iMonCtr=1 Model crash detected, will try to restart... 18:59:00 (6108): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:59:01 (6108): No heartbeat from core client for 30 sec - exiting 18:59:02 (6108): No heartbeat from core client for 30 sec - exiting 18:59:03 (6108): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5348, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 18:13:16 (6052): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:13:17 (6052): No heartbeat from core client for 30 sec - exiting 18:13:18 (6052): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 20:39:40 (5984): No heartbeat from core client for 30 sec - exiting 20:39:41 (5984): No heartbeat from core client for 30 sec - exiting 20:39:42 (5984): No heartbeat from core client for 30 sec - exiting 20:39:43 (5984): No heartbeat from core client for 30 sec - exiting 20:39:44 (5984): No heartbeat from core client for 30 sec - exiting 20:39:45 (5984): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:39:46 (5984): No heartbeat from core client for 30 sec - exiting 20:39:47 (5984): No heartbeat from core client for 30 sec - exiting 20:39:48 (5984): No heartbeat from core client for 30 sec - exiting 20:47:22 (792): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:47:23 (792): No heartbeat from core client for 30 sec - exiting 20:47:24 (792): No heartbeat from core client for 30 sec - exiting 20:47:25 (792): No heartbeat from core client for 30 sec - exiting 20:47:26 (792): No heartbeat from core client for 30 sec - exiting 20:47:27 (792): No heartbeat from core client for 30 sec - exiting 20:47:28 (792): No heartbeat from core client for 30 sec - exiting 20:47:29 (792): No heartbeat from core client for 30 sec - exiting 20:47:30 (792): No heartbeat from core client for 30 sec - exiting 20:47:31 (792): No heartbeat from core client for 30 sec - exiting 20:47:32 (792): No heartbeat from core client for 30 sec - exiting 22:16:07 (5228): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:59:40 (4388): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6336, iMonCtr=1 Model crash detected, will try to restart... 17:13:04 (2660): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5568, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6260, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6260, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6260, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6260, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6260, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
15 Jul 2012 16:13:49	1047718	14651134	hadcm3n_o2q8_2060_40_007958254_1	362,880	501,766	1.3827
03 Jul 2012 18:16:24	1047718	14651134	hadcm3n_o2q8_2060_40_007958254_1	336,960	464,169	1.3775
21 Jun 2012 20:51:41	1047718	14651134	hadcm3n_o2q8_2060_40_007958254_1	311,040	426,352	1.3707
15 Jun 2012 18:12:04	1047718	14651134	hadcm3n_o2q8_2060_40_007958254_1	285,120	390,939	1.3711
12 Jun 2012 19:56:51	1047718	14651134	hadcm3n_o2q8_2060_40_007958254_1	259,200	357,327	1.3786
09 Jun 2012 19:27:14	1047718	14651134	hadcm3n_o2q8_2060_40_007958254_1	233,280	320,833	1.3753
07 Jun 2012 00:30:04	1047718	14651134	hadcm3n_o2q8_2060_40_007958254_1	207,360	286,898	1.3836
06 Jun 2012 11:39:57	1047718	14651134	hadcm3n_o2q8_2060_40_007958254_1	181,440	250,394	1.3800
29 May 2012 21:33:31	1047718	14651134	hadcm3n_o2q8_2060_40_007958254_1	155,520	213,156	1.3706
26 May 2012 18:23:51	1047718	14651134	hadcm3n_o2q8_2060_40_007958254_1	129,600	175,574	1.3547
23 May 2012 20:22:16	1047718	14651134	hadcm3n_o2q8_2060_40_007958254_1	103,680	138,631	1.3371
21 May 2012 17:14:18	1047718	14651134	hadcm3n_o2q8_2060_40_007958254_1	77,760	104,550	1.3445
13 May 2012 17:04:22	1047718	14651134	hadcm3n_o2q8_2060_40_007958254_1	51,840	70,346	1.3570
11 May 2012 20:50:54	1047718	14651134	hadcm3n_o2q8_2060_40_007958254_1	25,920	35,429	1.3669