Name | hadam3p_eu_a6w1_2013_1_008569967_0 |
Workunit | 8716479 |
Created | 19 Mar 2014, 11:52:00 UTC |
Sent | 19 Mar 2014, 15:59:25 UTC |
Report deadline | 1 Mar 2015, 21:19:25 UTC |
Received | 14 Apr 2014, 9:09:21 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 0 (0x00000000) |
Computer ID | 1257976 |
Run time | 2 days 18 hours 44 min 45 sec |
CPU time | 2 days 4 hours 51 min 28 sec |
Validate state | Invalid |
Credit | 1,194.02 |
Device peak FLOPS | 2.40 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Europe v6.09 windows_intelx86 |
Stderr | <core_client_version>7.2.33</core_client_version> <![CDATA[ <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5944, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4276, iMonCtr=2 13:12:04 (1308): No heartbeat from core client for 30 sec - exiting 13:12:05 (1308): No heartbeat from core client for 30 sec - exiting 13:12:06 (1308): No heartbeat from core client for 30 sec - exiting 13:12:07 (1308): No heartbeat from core client for 30 sec - exiting 13:12:08 (1308): No heartbeat from core client for 30 sec - exiting 13:12:09 (1308): No heartbeat from core client for 30 sec - exiting 13:12:10 (1308): No heartbeat from core client for 30 sec - exiting 13:12:11 (1308): No heartbeat from core client for 30 sec - exiting 13:12:12 (1308): No heartbeat from core client for 30 sec - exiting 13:12:14 (1308): No heartbeat from core client for 30 sec - exiting 13:12:15 (1308): No heartbeat from core client for 30 sec - exiting 13:12:16 (1308): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3864, selfPID=3628, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3960, selfPID=3528, iMonCtr=1 Model crash detected, will try to restart... CGlobal Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4552, iMonCtr=2 15:53:30 (3564): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3700, selfPID=3428, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3600, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3760, selfPID=3404, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Suspended CPDN Monitor - Suspend request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1768, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3052, selfPID=3432, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3028, selfPID=3392, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3452, selfPID=3340, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2212, selfPID=3292, iMonCtr=1 Model crash detected, will try to restart... 13:45:07 (3616): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3264, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3092, selfPID=3700, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3688, selfPID=3484, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4092, selfPID=3620, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4220, selfPID=3472, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=508, selfPID=3516, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 17:42:23 (3476): No heartbeat from core client for 30 sec - exiting 17:42:24 (3476): No heartbeat from core client for 30 sec - exiting 17:42:25 (3476): No heartbeat from core client for 30 sec - exiting 17:42:26 (3476): No heartbeat from core client for 30 sec - exiting 17:42:27 (3476): No heartbeat from core client for 30 sec - exiting 17:42:28 (3476): No heartbeat from core client for 30 sec - exiting 17:42:29 (3476): No heartbeat from core client for 30 sec - exiting 17:42:30 (3476): No heartbeat from core client for 30 sec - exiting 17:42:32 (3476): No heartbeat from core client for 30 sec - exiting 17:42:33 (3476): No heartbeat from core client for 30 sec - exiting 17:42:34 (3476): No heartbeat from core client for 30 sec - exiting 17:42:35 (3476): No heartbeat from core client for 30 sec - exiting 17:42:36 (3476): No heartbeat from core client for 30 sec - exiting 17:42:37 (3476): No heartbeat from core client for 30 sec - exiting 17:42:38 (3476): No heartbeat from core client for 30 sec - exiting 17:42:39 (3476): No heartbeat from core client for 30 sec - exiting 17:42:40 (3476): No heartbeat from core client for 30 sec - exiting 17:42:41 (3476): No heartbeat from core client for 30 sec - exiting 17:42:42 (3476): No heartbeat from core client for 30 sec - exiting 17:42:44 (3476): No heartbeat from core client for 30 sec - exiting 17:42:45 (3476): No heartbeat from core client for 30 sec - exiting 17:42:46 (3476): No heartbeat from core client for 30 sec - exiting 17:42:47 (3476): No heartbeat from core client for 30 sec - exiting 17:42:48 (3476): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:28:56 (3804): No heartbeat from core client for 30 sec - exiting 18:28:57 (3804): No heartbeat from core client for 30 sec - exiting 18:28:58 (3804): No heartbeat from core client for 30 sec - exiting 18:28:59 (3804): No heartbeat from core client for 30 sec - exiting 18:29:00 (3804): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3408, selfPID=3292, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 21:15:57 (3432): No heartbeat from core client for 30 sec - exiting 21:15:58 (3432): No heartbeat from core client for 30 sec - exiting 21:15:59 (3432): No heartbeat from core client for 30 sec - exiting 21:16:00 (3432): No heartbeat from core client for 30 sec - exiting 21:16:01 (3432): No heartbeat from core client for 30 sec - exiting 21:16:02 (3432): No heartbeat from core client for 30 sec - exiting 21:16:03 (3432): No heartbeat from core client for 30 sec - exiting 21:16:04 (3432): No heartbeat from core client for 30 sec - exiting 21:16:05 (3432): No heartbeat from core client for 30 sec - exiting 21:16:06 (3432): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4676, selfPID=3860, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4120, selfPID=3444, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4012, selfPID=3516, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1332, selfPID=3384, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4324, selfPID=3720, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO tmp/xaakm.pipe_dummy 2048 Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_eu_a6w1_2013_1_008569967_0_7.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_a6w1_2013_1_008569967_0_8.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_a6w1_2013_1_008569967_0_9.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_a6w1_2013_1_008569967_0_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_a6w1_2013_1_008569967_0_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_a6w1_2013_1_008569967_0_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
09 Apr 2014 18:17:39 | 1257976 | 16380213 | hadam3p_eu_a6w1_2013_1_008569967_0 | 69,216 | 173,344 | 2.5044 |
06 Apr 2014 12:09:55 | 1257976 | 16380213 | hadam3p_eu_a6w1_2013_1_008569967_0 | 57,696 | 143,930 | 2.4946 |
04 Apr 2014 10:05:13 | 1257976 | 16380213 | hadam3p_eu_a6w1_2013_1_008569967_0 | 46,176 | 114,288 | 2.4751 |
26 Mar 2014 12:15:39 | 1257976 | 16380213 | hadam3p_eu_a6w1_2013_1_008569967_0 | 34,656 | 85,386 | 2.4638 |
23 Mar 2014 16:10:00 | 1257976 | 16380213 | hadam3p_eu_a6w1_2013_1_008569967_0 | 23,141 | 57,251 | 2.4740 |
23 Mar 2014 14:44:47 | 1257976 | 16380213 | hadam3p_eu_a6w1_2013_1_008569967_0 | 23,136 | 57,062 | 2.4664 |
20 Mar 2014 10:46:15 | 1257976 | 16380213 | hadam3p_eu_a6w1_2013_1_008569967_0 | 11,616 | 28,406 | 2.4454 |
©2024 cpdn.org