Task 17590759

Name	hadam3p_anz_m1pd_2012_1_009304519_0
Workunit	9388707
Created	17 Dec 2014, 19:32:57 UTC
Sent	23 Dec 2014, 13:10:39 UTC
Report deadline	5 Dec 2015, 18:30:39 UTC
Received	3 Feb 2015, 13:55:39 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	0 (0x00000000)
Computer ID	1336896
Run time	7 days 19 hours 7 min 22 sec
CPU time	7 days 3 hours 32 min 35 sec
Validate state	Invalid
Credit	4,981.10
Device peak FLOPS	3.53 GFLOPS
Application version	UK Met Office HadAM3P-HadRM3P Australia New Zealand v6.10 windows_intelx86
Stderr	<core_client_version>7.4.36</core_client_version> <![CDATA[ <stderr_txt> Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8132, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6288, selfPID=7676, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6020, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=916, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1092, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2652, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is noCPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3336, selfPID=4328, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1956, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4424, selfPID=628, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=676, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3308, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5080, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3040, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4152, selfPID=1276, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2212, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3604, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 09:25:01 (6296): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:25:02 (6296): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... RGlobal Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6352, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3284, selfPID=4548, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3552, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3460, selfPID=2688, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CGoltrololer WorkerN process iocess running, exiting, bRetVal = 1, chec PID=0, selfPID= selfPID=244tr=2 Model crash detecte0, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4484, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4376, iMonCtr=2 Leaving CPDN_Main::Monitor... 18:40:39 (2164): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:40:40 (2164): No heartbeat from core client for 30 sec - exiting 18:40:41 (2164): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 0, checkPID=0, selfPID=6288, iMonCtr=1 Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=0, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6384, selfPID=2892, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_anz_m1pd_2012_1_009304519_0_11.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_m1pd_2012_1_009304519_0_12.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> </message> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
30 Jan 2015 16:41:31	1336896	17590759	hadam3p_anz_m1pd_2012_1_009304519_0	115,499	568,593	4.9229
29 Jan 2015 09:22:57	1336896	17590759	hadam3p_anz_m1pd_2012_1_009304519_0	103,979	512,665	4.9305
27 Jan 2015 13:17:47	1336896	17590759	hadam3p_anz_m1pd_2012_1_009304519_0	92,459	456,342	4.9356
26 Jan 2015 07:57:14	1336896	17590759	hadam3p_anz_m1pd_2012_1_009304519_0	80,939	399,919	4.9410
22 Jan 2015 20:43:22	1336896	17590759	hadam3p_anz_m1pd_2012_1_009304519_0	69,419	343,700	4.9511
21 Jan 2015 13:16:24	1336896	17590759	hadam3p_anz_m1pd_2012_1_009304519_0	57,899	286,671	4.9512
14 Jan 2015 15:33:56	1336896	17590759	hadam3p_anz_m1pd_2012_1_009304519_0	46,379	232,641	5.0161
12 Jan 2015 17:40:14	1336896	17590759	hadam3p_anz_m1pd_2012_1_009304519_0	34,859	174,603	5.0088
09 Jan 2015 10:38:45	1336896	17590759	hadam3p_anz_m1pd_2012_1_009304519_0	23,339	116,280	4.9822
30 Dec 2014 14:50:01	1336896	17590759	hadam3p_anz_m1pd_2012_1_009304519_0	11,819	58,912	4.9845