Name | hadam3p_anz_l3ut_2012_1_009303780_0 |
Workunit | 9387968 |
Created | 17 Dec 2014, 19:27:12 UTC |
Sent | 23 Dec 2014, 23:47:24 UTC |
Report deadline | 6 Dec 2015, 5:07:24 UTC |
Received | 1 Jan 2015, 0:26:09 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 0 (0x00000000) |
Computer ID | 1233028 |
Run time | 6 days 10 hours 11 min 23 sec |
CPU time | 3 days 19 hours 0 min 27 sec |
Validate state | Invalid |
Credit | 1,503.36 |
Device peak FLOPS | 2.50 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Australia New Zealand v6.10 windows_intelx86 |
Stderr | <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> 16:12:48 (3608): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:12:49 (3608): No heartbeat from core client for 30 sec - exiting 18:49:22 (2596): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:49:31 (2324): Can't acquire lockfile (32) - waiting 35s 20:49:58 (2324): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:08:14 (3636): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:30:41 (6812): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:30:42 (6812): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7616, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5752, iMonCtr=2 20:06:02 (4960): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3352, selfPID=3352, iMonCtr=2 20:06:03 (4960): No heartbeat from core client for 30 sec - exiting 11:02:55 (6880): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 01:47:46 (7900): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:46:35 (6496): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:46:36 (6496): No heartbeat from core client for 30 sec - exiting 12:46:37 (6496): No heartbeat from core client for 30 sec - exiting 12:46:38 (6496): No heartbeat from core client for 30 sec - exiting 12:46:39 (6496): No heartbeat from core client for 30 sec - exiting 12:46:40 (6496): No heartbeat from core client for 30 sec - exiting 12:55:38 (2424): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:29:53 (6228): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:29:54 (6228): No heartbeat from core client for 30 sec - exiting 20:29:55 (6228): No heartbeat from core client for 30 sec - exiting 00:33:37 (5344): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:33:38 (5344): No heartbeat from core client for 30 sec - exiting 00:33:39 (5344): No heartbeat from core client for 30 sec - exiting 00:33:40 (5344): No heartbeat from core client for 30 sec - exiting 00:33:41 (5344): No heartbeat from core client for 30 sec - exiting 00:33:42 (5344): No heartbeat from core client for 30 sec - exiting 00:33:43 (5344): No heartbeat from core client for 30 sec - exiting 00:33:44 (5344): No heartbeat from core client for 30 sec - exiting 00:33:45 (5344): No heartbeat from core client for 30 sec - exiting 06:41:55 (4200): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 04:44:21 (6864): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 04:44:22 (6864): No heartbeat from core client for 30 sec - exiting 04:44:23 (6864): No heartbeat from core client for 30 sec - exiting 04:44:24 (6864): No heartbeat from core client for 30 sec - exiting 04:44:25 (6864): No heartbeat from core client for 30 sec - exiting 04:44:26 (6864): No heartbeat from core client for 30 sec - exiting 04:44:27 (6864): No heartbeat from core client for 30 sec - exiting 04:44:28 (6864): No heartbeat from core client for 30 sec - exiting 04:44:29 (6864): No heartbeat from core client for 30 sec - exiting 04:44:30 (6864): No heartbeat from core client for 30 sec - exiting 04:47:30 (4800): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8956, selfPID=8956, iMonCtr=2 05:59:44 (7404): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:59:45 (7404): No heartbeat from core client for 30 sec - exiting 05:59:46 (7404): No heartbeat from core client for 30 sec - exiting 05:59:47 (7404): No heartbeat from core client for 30 sec - exiting 05:59:48 (7404): No heartbeat from core client for 30 sec - exiting 05:59:49 (7404): No heartbeat from core client for 30 sec - exiting 05:59:59 (3804): Can't acquire lockfile (32) - waiting 35s 18:21:31 (3804): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:21:32 (3804): No heartbeat from core client for 30 sec - exiting 18:21:33 (3804): No heartbeat from core client for 30 sec - exiting 18:21:34 (3804): No heartbeat from core client for 30 sec - exiting 18:21:35 (3804): No heartbeat from core client for 30 sec - exiting 18:21:36 (3804): No heartbeat from core client for 30 sec - exiting 18:21:37 (3804): No heartbeat from core client for 30 sec - exiting Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8128, selfPID=8128, iMonCtr=2 Global Worker:: CPDN process is not running, exiting, bRetVal = 0, checkPID=0, selfPID=10044, iMonCtr=1 Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=10124, selfPID=10124, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=10124, selfPID=9232, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_anz_l3ut_2012_1_009303780_0_4.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_l3ut_2012_1_009303780_0_5.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_l3ut_2012_1_009303780_0_6.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_l3ut_2012_1_009303780_0_7.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_l3ut_2012_1_009303780_0_8.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_l3ut_2012_1_009303780_0_9.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_l3ut_2012_1_009303780_0_10.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_l3ut_2012_1_009303780_0_11.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_l3ut_2012_1_009303780_0_12.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> </message> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
30 Dec 2014 13:39:51 | 1233028 | 17590006 | hadam3p_anz_l3ut_2012_1_009303780_0 | 34,859 | 269,138 | 7.7208 |
28 Dec 2014 14:06:01 | 1233028 | 17590006 | hadam3p_anz_l3ut_2012_1_009303780_0 | 23,339 | 177,000 | 7.5839 |
26 Dec 2014 17:49:43 | 1233028 | 17590006 | hadam3p_anz_l3ut_2012_1_009303780_0 | 11,819 | 89,235 | 7.5501 |
©2024 cpdn.org