Task 13185881

Name	hadam3p_eu_2qdt_1971_1_007384569_0
Workunit	7581999
Created	1 Aug 2011, 9:52:05 UTC
Sent	1 Aug 2011, 10:07:51 UTC
Report deadline	13 Jul 2012, 15:27:51 UTC
Received	9 Sep 2011, 12:04:39 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	0 (0x00000000)
Computer ID	1065244
Run time	4 days 5 hours 14 min 14 sec
CPU time	3 days 20 hours 47 min 1 sec
Validate state	Invalid
Credit	1,591.50
Device peak FLOPS	1.88 GFLOPS
Application version	UK Met Office HadAM3P-HadRM3P Europe v6.09 windows_intelx86
Stderr	<core_client_version>6.10.18</core_client_version> <![CDATA[ <stderr_txt> Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4680, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running,CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3896, selfPID=3896, iMonCtr=2 Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5996, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4200, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... CGlobal Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3328, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2640, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3260, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4648, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2508, selfPID=4136, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3128, selfPID=4420, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3676, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5112, iMonCtr=2 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... GlobalCPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3952, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4876, selfPID=4148, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=420, selfPID=5092, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6012, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3820, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3712, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3604, iMonCtr=2 Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4544, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4572, selfPID=1404, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4440, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4532, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4236, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3396, selfPID=5084, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4848, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4860, selfPID=2160, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3992, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3496, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 17:08:22 (3236): No heartbeat from core client for 30 sec - exiting 17:08:23 (3236): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6120, selfPID=4904, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5000, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1184, iMonCtr=2 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3240, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4208, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3360, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5236, selfPID=5236, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4560, iMonCtr=2 Mode l crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3920, selfPID=3920, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3920, selfPID=3580, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>hadam3p_eu_2qdt_1971_1_007384569_0_9.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_2qdt_1971_1_007384569_0_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_2qdt_1971_1_007384569_0_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_2qdt_1971_1_007384569_0_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
08 Sep 2011 10:20:55	1065244	13185881	hadam3p_eu_2qdt_1971_1_007384569_0	92,257	310,258	3.3630
08 Sep 2011 09:18:12	1065244	13185881	hadam3p_eu_2qdt_1971_1_007384569_0	92,256	309,628	3.3562
03 Sep 2011 16:23:31	1065244	13185881	hadam3p_eu_2qdt_1971_1_007384569_0	80,737	269,779	3.3415
03 Sep 2011 15:21:42	1065244	13185881	hadam3p_eu_2qdt_1971_1_007384569_0	80,736	269,228	3.3347
31 Aug 2011 08:50:51	1065244	13185881	hadam3p_eu_2qdt_1971_1_007384569_0	69,216	229,517	3.3160
30 Aug 2011 09:01:53	1065244	13185881	hadam3p_eu_2qdt_1971_1_007384569_0	57,696	193,482	3.3535
28 Aug 2011 15:12:13	1065244	13185881	hadam3p_eu_2qdt_1971_1_007384569_0	46,176	157,509	3.4111
27 Aug 2011 11:51:54	1065244	13185881	hadam3p_eu_2qdt_1971_1_007384569_0	34,656	120,001	3.4626
15 Aug 2011 20:25:08	1065244	13185881	hadam3p_eu_2qdt_1971_1_007384569_0	23,137	80,464	3.4777
15 Aug 2011 19:11:36	1065244	13185881	hadam3p_eu_2qdt_1971_1_007384569_0	23,136	79,890	3.4531
07 Aug 2011 12:28:10	1065244	13185881	hadam3p_eu_2qdt_1971_1_007384569_0	11,616	40,815	3.5137