Name | hadam3p_pnw_q9b8_2042_1_008372155_0 |
Workunit | 8523014 |
Created | 28 May 2013, 20:11:50 UTC |
Sent | 28 May 2013, 20:12:48 UTC |
Report deadline | 11 May 2014, 1:32:48 UTC |
Received | 18 Jul 2013, 16:20:34 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 0 (0x00000000) |
Computer ID | 1272458 |
Run time | 10 days 5 hours 58 min 39 sec |
CPU time | 15 hours 18 min 15 sec |
Validate state | Invalid |
Credit | 2,505.24 |
Device peak FLOPS | 3.00 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Pacific North West v6.09 windows_intelx86 |
Stderr | <core_client_version>7.0.64</core_client_version> <![CDATA[ <stderr_txt> Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3364, iMonCtr=2 Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2580, selfPID=3344, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3636, selfPID=3436, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 0 Suspended CPDN Monitor - Suspend request from BOINC... 06:53:49 (3368): No heartbeat from core client for 30 sec - exiting 06:53:50 (3368): No heartbeat from core client for 30 sec - exiting 06:53:51 (3368): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:53:52 (3368): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3936, selfPID=940, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1584, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 1 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4076, selfPID=3292, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3736, selfPID=1436, iMonCtr=1 Model crash detected, will try to restart... GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3944, selfPID=1252, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3692, selfPID=1520, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 2 Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3544, selfPID=2984, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 4 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3104, selfPID=2680, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 4 Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3948, selfPID=2772, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4764, selfPID=2928, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3680, selfPID=2300, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3904, selfPID=3112, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3764, selfPID=2380, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2096, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 4 CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3744, selfPID=2172, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3252, selfPID=2120, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 6 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4092, selfPID=3004, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2112, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3928, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3952, selfPID=2948, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 6 CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3392, selfPID=2816, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3828, selfPID=3828, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3884, selfPID=2308, iMonCtr=1 Model crash detected, will try to restart... GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3984, selfPID=2536, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3624, selfPID=2292, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=960, selfPID=2936, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 10 Called boinc_finish CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3560, selfPID=1720, iMonCtr=1 Model crash detected, will try to restart... C20:23:29 (3196): No heartbeat from core client for 30 sec - exiting 20:23:30 (3196): No heartbeat from core client for 30 sec - exiting 20:23:32 (3196): No heartbeat from core client for 30 sec - exiting 20:23:33 (3196): No heartbeat from core client for 30 sec - exiting 20:23:34 (3196): No heartbeat from core client for 30 sec - exiting 20:23:35 (3196): No heartbeat from core client for 30 sec - exiting 20:23:36 (3196): No heartbeat from core client for 30 sec - exiting 20:23:37 (3196): No heartbeat from core client for 30 sec - exiting 20:23:38 (3196): No heartbeat from core client for 30 sec - exiting 20:23:39 (3196): No heartbeat from core client for 30 sec - exiting 20:23:41 (3196): No heartbeat from core client for 30 sec - exiting 20:23:42 (3196): No heartbeat from core client for 30 sec - exiting 20:23:43 (3196): No heartbeat from core client for 30 sec - exiting 20:23:44 (3196): No heartbeat from core client for 30 sec - exiting 20:23:45 (3196): No heartbeat from core client for 30 sec - exiting 20:23:46 (3196): No heartbeat from core client for 30 sec - exiting 20:23:47 (3196): No heartbeat from core client for 30 sec - exiting 20:23:48 (3196): No heartbeat from core client for 30 sec - exiting 20:23:49 (3196): No heartbeat from core client for 30 sec - exiting 20:23:50 (3196): No heartbeat from core client for 30 sec - exiting 20:23:51 (3196): No heartbeat from core client for 30 sec - exiting 20:23:52 (3196): No heartbeat from core client for 30 sec - exiting 20:23:53 (3196): No heartbeat from core client for 30 sec - exiting 20:23:54 (3196): No heartbeat from core client for 30 sec - exiting 20:23:55 (3196): No heartbeat from core client for 30 sec - exiting 20:23:57 (3196): No heartbeat from core client for 30 sec - exiting 20:23:58 (3196): No heartbeat from core client for 30 sec - exiting 20:23:59 (3196): No heartbeat from core client for 30 sec - exiting 20:24:00 (3196): No heartbeat from core client for 30 sec - exiting 20:24:01 (3196): No heartbeat from core client for 30 sec - exiting 20:24:02 (3196): No heartbeat from core client for 30 sec - exiting 20:24:03 (3196): No heartbeat from core client for 30 sec - exiting 20:24:04 (3196): No heartbeat from core client for 30 sec - exiting 20:24:05 (3196): No heartbeat from core client for 30 sec - exiting 20:24:06 (3196): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... C08:52:15 (3164): No heartbeat from core client for 30 sec - exiting 08:52:17 (3164): No heartbeat from core client for 30 sec - exiting 08:52:18 (3164): No heartbeat from core client for 30 sec - exiting 08:52:19 (3164): No heartbeat from core client for 30 sec - exiting 08:52:20 (3164): No heartbeat from core client for 30 sec - exiting 08:52:21 (3164): No heartbeat from core client for 30 sec - exiting 08:52:22 (3164): No heartbeat from core client for 30 sec - exiting 08:52:23 (3164): No heartbeat from core client for 30 sec - exiting 08:52:24 (3164): No heartbeat from core client for 30 sec - exiting 08:52:25 (3164): No heartbeat from core client for 30 sec - exiting 08:52:26 (3164): No heartbeat from core client for 30 sec - exiting 08:52:28 (3164): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional yearly means requires 12 input files got 1 Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO tmp/xaakg.pipe_dummy 2048 Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 1 Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_pnw_q9b8_2042_1_008372155_0_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_q9b8_2042_1_008372155_0_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
23 Jul 2013 17:48:56 | 1272458 | 15799918 | hadam3p_pnw_q9b8_2042_1_008372155_0 | 115,296 | 634,946 | 5.5071 |
08 Jul 2013 16:54:38 | 1272458 | 15799918 | hadam3p_pnw_q9b8_2042_1_008372155_0 | 103,776 | 572,814 | 5.5197 |
07 Jul 2013 13:21:55 | 1272458 | 15799918 | hadam3p_pnw_q9b8_2042_1_008372155_0 | 92,256 | 511,419 | 5.5435 |
06 Jul 2013 05:03:14 | 1272458 | 15799918 | hadam3p_pnw_q9b8_2042_1_008372155_0 | 80,736 | 445,161 | 5.5138 |
02 Jul 2013 10:56:27 | 1272458 | 15799918 | hadam3p_pnw_q9b8_2042_1_008372155_0 | 69,216 | 382,533 | 5.5267 |
02 Jul 2013 10:08:54 | 1272458 | 15799918 | hadam3p_pnw_q9b8_2042_1_008372155_0 | 57,696 | 317,112 | 5.4963 |
22 Jun 2013 12:58:07 | 1272458 | 15799918 | hadam3p_pnw_q9b8_2042_1_008372155_0 | 46,176 | 259,193 | 5.6132 |
21 Jun 2013 12:20:01 | 1272458 | 15799918 | hadam3p_pnw_q9b8_2042_1_008372155_0 | 34,656 | 195,005 | 5.6269 |
15 Jun 2013 17:30:43 | 1272458 | 15799918 | hadam3p_pnw_q9b8_2042_1_008372155_0 | 23,136 | 129,178 | 5.5834 |
09 Jun 2013 19:18:55 | 1272458 | 15799918 | hadam3p_pnw_q9b8_2042_1_008372155_0 | 11,616 | 64,127 | 5.5206 |
©2024 cpdn.org