Task 16451183

Name	hadam3p_anz_aal0_2012_1_008620052_0
Workunit	8766564
Created	2 Apr 2014, 16:21:57 UTC
Sent	22 Apr 2014, 10:21:50 UTC
Report deadline	4 Apr 2015, 15:41:50 UTC
Received	14 Dec 2014, 20:22:30 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	0 (0x00000000)
Computer ID	1316929
Run time	7 days 5 hours 48 min 39 sec
CPU time	5 days 18 hours 15 min 18 sec
Validate state	Invalid
Credit	2,993.82
Device peak FLOPS	2.48 GFLOPS
Application version	UK Met Office HadAM3P-HadRM3P Australia New Zealand v6.10 windows_intelx86
Stderr	<core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8512, selfPID=6908, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1920, selfPID=4996, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6036, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4876, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4928, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6112, iMonCtr=2 Leaving CPDN_Main::Monitor... Suspended CPDN Monitor - Suspend request from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4592, selfPID=4376, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2592, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4544, selfPID=1516, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3684, iMonCtr=2 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5072, selfPID=2100, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2648, selfPID=4704, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4776, selfPID=4332, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4736, iMonCtr=2 Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2124, iMonCtr=2 CGlobal Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5984, iMonCtr=2 ontroller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4764, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 19:12:56 (2964): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:12:57 (2964): No heartbeat from core client for 30 sec - exiting 19:12:58 (2964): No heartbeat from core client for 30 sec - exiting 19:12:59 (2964): No heartbeat from core client for 30 sec - exiting 19:13:00 (2964): No heartbeat from core client for 30 sec - exiting 19:13:01 (2964): No heartbeat from core client for 30 sec - exiting 19:13:03 (2964): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4500, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4580, selfPID=328, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4876, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5596, iMonCtr=2 CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5812, iMonCtr=2 CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5808, iMonCtr=2 GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5064, iMonCtr=2 Model crash detected, will try to restart... CGCPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CGGCPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4932, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not rController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5528, selfPID=4760, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 0, checkPID=0, selfPID=0, iMonCtr=1 Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=0, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4604, selfPID=3416, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_anz_aal0_2012_1_008620052_0_7.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_aal0_2012_1_008620052_0_8.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_aal0_2012_1_008620052_0_9.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_aal0_2012_1_008620052_0_10.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_aal0_2012_1_008620052_0_11.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_aal0_2012_1_008620052_0_12.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> </message> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
15 Aug 2014 17:28:10	1316929	16451183	hadam3p_anz_aal0_2012_1_008620052_0	69,419	455,783	6.5657
28 Jun 2014 20:53:45	1316929	16451183	hadam3p_anz_aal0_2012_1_008620052_0	57,899	381,680	6.5922
21 May 2014 21:30:02	1316929	16451183	hadam3p_anz_aal0_2012_1_008620052_0	46,379	307,316	6.6262
08 May 2014 21:23:10	1316929	16451183	hadam3p_anz_aal0_2012_1_008620052_0	34,859	232,523	6.6704
29 Apr 2014 20:54:08	1316929	16451183	hadam3p_anz_aal0_2012_1_008620052_0	23,339	155,849	6.6776
25 Apr 2014 13:53:20	1316929	16451183	hadam3p_anz_aal0_2012_1_008620052_0	11,819	79,230	6.7036