Name | hadam3p_pnw_pmhq_2013_1_009976841_0 |
Workunit | 9983199 |
Created | 29 Jun 2015, 18:01:08 UTC |
Sent | 30 Jun 2015, 11:19:03 UTC |
Report deadline | 11 Jun 2016, 16:39:03 UTC |
Received | 18 Aug 2015, 15:32:33 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 0 (0x00000000) |
Computer ID | 1129146 |
Run time | 5 days 7 hours 19 min 27 sec |
CPU time | 4 days 19 hours 32 min 53 sec |
Validate state | Invalid |
Credit | 3,260.60 |
Device peak FLOPS | 3.25 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Pacific North West v7.27 windows_intelx86 |
Stderr | <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> 17:02:06 (5572): No heartbeat from client for 30 sec - exiting 17:02:06 (5572): timer handler: client dead, exiting 17:02:07 (5572): No heartbeat from client for 30 sec - exiting 17:02:07 (5572): timer handler: client dead, exiting 17:02:08 (5572): No heartbeat from client for 30 sec - exiting 17:02:08 (5572): timer handler: client dead, exiting 17:02:09 (5572): No heartbeat from client for 30 sec - exiting 17:02:09 (5572): timer handler: client dead, exiting 17:02:10 (5572): No heartbeat from client for 30 sec - exiting 17:02:10 (5572): timer handler: client dead, exiting 17:02:11 (5572): No heartbeat from client for 30 sec - exiting 17:02:11 (5572): timer handler: client dead, exiting 17:02:12 (5572): No heartbeat from client for 30 sec - exiting 17:02:12 (5572): timer handler: client dead, exiting 17:02:13 (5572): No heartbeat from client for 30 sec - exiting 17:02:13 (5572): timer handler: client dead, exiting 17:02:14 (5572): No heartbeat from client for 30 sec - exiting 17:02:14 (5572): timer handler: client dead, exiting 17:02:15 (5572): No heartbeat from client for 30 sec - exiting 17:02:15 (5572): timer handler: client dead, exiting 17:02:17 (5572): No heartbeat from client for 30 sec - exiting 17:02:17 (5572): timer handler: client dead, exiting 17:02:18 (5572): No heartbeat from client for 30 sec - exiting 17:02:18 (5572): timer handler: client dead, exiting 17:02:19 (5572): No heartbeat from client for 30 sec - exiting 17:02:19 (5572): timer handler: client dead, exiting 17:02:20 (5572): No heartbeat from client for 30 sec - exiting 17:02:20 (5572): timer handler: client dead, exiting 17:02:21 (5572): No heartbeat from client for 30 sec - exiting 17:02:21 (5572): timer handler: client dead, exiting 17:02:22 (5572): No heartbeat from client for 30 sec - exiting 17:02:22 (5572): timer handler: client dead, exiting 17:02:23 (5572): No heartbeat from client for 30 sec - exiting 17:02:23 (5572): timer handler: client dead, exiting 17:02:24 (5572): No heartbeat from client for 30 sec - exiting 17:02:24 (5572): timer handler: client dead, exiting 17:02:25 (5572): No heartbeat from client for 30 sec - exiting 17:02:25 (5572): timer handler: client dead, exiting 17:02:26 (5572): No heartbeat from client for 30 sec - exiting 17:02:26 (5572): timer handler: client dead, exiting 17:02:27 (5572): No heartbeat from client for 30 sec - exiting 17:02:27 (5572): timer handler: client dead, exiting 17:02:29 (5572): No heartbeat from client for 30 sec - exiting 17:02:29 (5572): timer handler: client dead, exiting 17:02:30 (5572): No heartbeat from client for 30 sec - exiting 17:02:30 (5572): timer handler: client dead, exiting 17:02:31 (5572): No heartbeat from client for 30 sec - exiting 17:02:31 (5572): timer handler: client dead, exiting 17:02:32 (5572): No heartbeat from client for 30 sec - exiting 17:02:32 (5572): timer handler: client dead, exiting 17:02:33 (5572): No heartbeat from client for 30 sec - exiting 17:02:33 (5572): timer handler: client dead, exiting 17:02:34 (5572): No heartbeat from client for 30 sec - exiting 17:02:34 (5572): timer handler: client dead, exiting 17:02:35 (5572): No heartbeat from client for 30 sec - exiting 17:02:35 (5572): timer handler: client dead, exiting 17:02:36 (5572): No heartbeat from client for 30 sec - exiting 17:02:36 (5572): timer handler: client dead, exiting 17:02:37 (5572): No heartbeat from client for 30 sec - exiting 17:02:37 (5572): timer handler: client dead, exiting 17:02:38 (5572): No heartbeat from client for 30 sec - exiting 17:02:38 (5572): timer handler: client dead, exiting 17:02:39 (5572): No heartbeat from client for 30 sec - exiting 17:02:39 (5572): timer handler: client dead, exiting 17:02:41 (5572): No heartbeat from client for 30 sec - exiting 17:02:41 (5572): timer handler: client dead, exiting 17:02:42 (5572): No heartbeat from client for 30 sec - exiting 17:02:42 (5572): timer handler: client dead, exiting 17:02:43 (5572): No heartbeat from client for 30 sec - exiting 17:02:43 (5572): timer handler: client dead, exiting 17:02:44 (5572): No heartbeat from client for 30 sec - exiting 17:02:44 (5572): timer handler: client dead, exiting 17:02:45 (5572): No heartbeat from client for 30 sec - exiting 17:02:45 (5572): timer handler: client dead, exiting 17:02:46 (5572): No heartbeat from client for 30 sec - exiting 17:02:46 (5572): timer handler: client dead, exiting 17:02:47 (5572): No heartbeat from client for 30 sec - exiting 17:02:47 (5572): timer handler: client dead, exiting 17:02:48 (5572): No heartbeat from client for 30 sec - exiting 17:02:48 (5572): timer handler: client dead, exiting 17:02:49 (5572): No heartbeat from client for 30 sec - exiting 17:02:49 (5572): timer handler: client dead, exiting 17:02:50 (5572): No heartbeat from client for 30 sec - exiting 17:02:50 (5572): timer handler: client dead, exiting 17:02:51 (5572): No heartbeat from client for 30 sec - exiting 17:02:51 (5572): timer handler: client dead, exiting 17:02:53 (5572): No heartbeat from client for 30 sec - exiting 17:02:53 (5572): timer handler: client dead, exiting 17:02:54 (5572): No heartbeat from client for 30 sec - exiting 17:02:54 (5572): timer handler: client dead, exiting 17:02:55 (5572): No heartbeat from client for 30 sec - exiting 17:02:55 (5572): timer handler: client dead, exiting 17:02:56 (5572): No heartbeat from client for 30 sec - exiting 17:02:56 (5572): timer handler: client dead, exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7676, selfPID=6980, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=240, selfPID=1812, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1256, selfPID=7040, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7220, selfPID=7108, iMonCtr=1 Model crash detected, will try to restart... CGlobal Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7468, iMonCtr=2 ontroller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6540, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7976, selfPID=4964, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7668, selfPID=6392, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7980, selfPID=6268, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7700, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7852, selfPID=7096, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4152, selfPID=4336, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7180, selfPID=6428, iMonCtr=1 Model crash detected, will try to restart... GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7056, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7044, iMonCtr=2 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6596, selfPID=5664, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7432, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7596, selfPID=4764, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7384, selfPID=6564, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8004, selfPID=5012, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4512, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8124, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8068, selfPID=7144, iMonCtr=1 Model crash detected, will try to restart... 22:09:31 (11032): start_timer_thread(): CreateThread() failed, errno 0 22:09:33 (1168): start_timer_thread(): CreateThread() failed, errno 0 16:32:15 (9068): start_timer_thread(): CreateThread() failed, errno 0 16:32:17 (5164): start_timer_thread(): CreateThread() failed, errno 0 Signal 11 received, exiting... 17:30:54 (9068): called boinc_finish(193) Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5164, selfPID=5164, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5164, selfPID=7552, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 17:31:11 (7552): called boinc_finish(0) </stderr_txt> <message> <file_xfer_error> <file_name>hadam3p_pnw_pmhq_2013_1_009976841_0_14.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_pmhq_2013_1_009976841_0_15.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_pmhq_2013_1_009976841_0_16.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_pmhq_2013_1_009976841_0_17.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_pmhq_2013_1_009976841_0_18.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
16 Aug 2015 20:09:34 | 1129146 | 18641773 | hadam3p_pnw_pmhq_2013_1_009976841_0 | 150,059 | 405,190 | 2.7002 |
08 Aug 2015 18:49:22 | 1129146 | 18641773 | hadam3p_pnw_pmhq_2013_1_009976841_0 | 138,539 | 373,098 | 2.6931 |
02 Aug 2015 10:20:11 | 1129146 | 18641773 | hadam3p_pnw_pmhq_2013_1_009976841_0 | 127,019 | 342,874 | 2.6994 |
29 Jul 2015 16:12:30 | 1129146 | 18641773 | hadam3p_pnw_pmhq_2013_1_009976841_0 | 115,499 | 312,798 | 2.7082 |
26 Jul 2015 10:16:12 | 1129146 | 18641773 | hadam3p_pnw_pmhq_2013_1_009976841_0 | 103,979 | 282,048 | 2.7125 |
24 Jul 2015 15:14:59 | 1129146 | 18641773 | hadam3p_pnw_pmhq_2013_1_009976841_0 | 92,459 | 251,205 | 2.7169 |
22 Jul 2015 12:39:13 | 1129146 | 18641773 | hadam3p_pnw_pmhq_2013_1_009976841_0 | 80,939 | 219,328 | 2.7098 |
18 Jul 2015 17:14:53 | 1129146 | 18641773 | hadam3p_pnw_pmhq_2013_1_009976841_0 | 69,419 | 186,775 | 2.6905 |
16 Jul 2015 19:37:29 | 1129146 | 18641773 | hadam3p_pnw_pmhq_2013_1_009976841_0 | 57,899 | 155,912 | 2.6928 |
12 Jul 2015 19:15:40 | 1129146 | 18641773 | hadam3p_pnw_pmhq_2013_1_009976841_0 | 46,379 | 124,908 | 2.6932 |
11 Jul 2015 16:45:43 | 1129146 | 18641773 | hadam3p_pnw_pmhq_2013_1_009976841_0 | 34,859 | 93,713 | 2.6883 |
08 Jul 2015 20:01:03 | 1129146 | 18641773 | hadam3p_pnw_pmhq_2013_1_009976841_0 | 23,339 | 62,860 | 2.6933 |
07 Jul 2015 14:17:24 | 1129146 | 18641773 | hadam3p_pnw_pmhq_2013_1_009976841_0 | 11,819 | 31,287 | 2.6472 |
©2024 cpdn.org