climateprediction.net home page
Task 18641773

Task 18641773

Name hadam3p_pnw_pmhq_2013_1_009976841_0
Workunit 9983199
Created 29 Jun 2015, 18:01:08 UTC
Sent 30 Jun 2015, 11:19:03 UTC
Report deadline 11 Jun 2016, 16:39:03 UTC
Received 18 Aug 2015, 15:32:33 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 1129146
Run time 5 days 7 hours 19 min 27 sec
CPU time 4 days 19 hours 32 min 53 sec
Validate state Invalid
Credit 3,260.60
Device peak FLOPS 3.25 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Pacific North West v7.27
windows_intelx86
Stderr
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
17:02:06 (5572): No heartbeat from client for 30 sec - exiting
17:02:06 (5572): timer handler: client dead, exiting
17:02:07 (5572): No heartbeat from client for 30 sec - exiting
17:02:07 (5572): timer handler: client dead, exiting
17:02:08 (5572): No heartbeat from client for 30 sec - exiting
17:02:08 (5572): timer handler: client dead, exiting
17:02:09 (5572): No heartbeat from client for 30 sec - exiting
17:02:09 (5572): timer handler: client dead, exiting
17:02:10 (5572): No heartbeat from client for 30 sec - exiting
17:02:10 (5572): timer handler: client dead, exiting
17:02:11 (5572): No heartbeat from client for 30 sec - exiting
17:02:11 (5572): timer handler: client dead, exiting
17:02:12 (5572): No heartbeat from client for 30 sec - exiting
17:02:12 (5572): timer handler: client dead, exiting
17:02:13 (5572): No heartbeat from client for 30 sec - exiting
17:02:13 (5572): timer handler: client dead, exiting
17:02:14 (5572): No heartbeat from client for 30 sec - exiting
17:02:14 (5572): timer handler: client dead, exiting
17:02:15 (5572): No heartbeat from client for 30 sec - exiting
17:02:15 (5572): timer handler: client dead, exiting
17:02:17 (5572): No heartbeat from client for 30 sec - exiting
17:02:17 (5572): timer handler: client dead, exiting
17:02:18 (5572): No heartbeat from client for 30 sec - exiting
17:02:18 (5572): timer handler: client dead, exiting
17:02:19 (5572): No heartbeat from client for 30 sec - exiting
17:02:19 (5572): timer handler: client dead, exiting
17:02:20 (5572): No heartbeat from client for 30 sec - exiting
17:02:20 (5572): timer handler: client dead, exiting
17:02:21 (5572): No heartbeat from client for 30 sec - exiting
17:02:21 (5572): timer handler: client dead, exiting
17:02:22 (5572): No heartbeat from client for 30 sec - exiting
17:02:22 (5572): timer handler: client dead, exiting
17:02:23 (5572): No heartbeat from client for 30 sec - exiting
17:02:23 (5572): timer handler: client dead, exiting
17:02:24 (5572): No heartbeat from client for 30 sec - exiting
17:02:24 (5572): timer handler: client dead, exiting
17:02:25 (5572): No heartbeat from client for 30 sec - exiting
17:02:25 (5572): timer handler: client dead, exiting
17:02:26 (5572): No heartbeat from client for 30 sec - exiting
17:02:26 (5572): timer handler: client dead, exiting
17:02:27 (5572): No heartbeat from client for 30 sec - exiting
17:02:27 (5572): timer handler: client dead, exiting
17:02:29 (5572): No heartbeat from client for 30 sec - exiting
17:02:29 (5572): timer handler: client dead, exiting
17:02:30 (5572): No heartbeat from client for 30 sec - exiting
17:02:30 (5572): timer handler: client dead, exiting
17:02:31 (5572): No heartbeat from client for 30 sec - exiting
17:02:31 (5572): timer handler: client dead, exiting
17:02:32 (5572): No heartbeat from client for 30 sec - exiting
17:02:32 (5572): timer handler: client dead, exiting
17:02:33 (5572): No heartbeat from client for 30 sec - exiting
17:02:33 (5572): timer handler: client dead, exiting
17:02:34 (5572): No heartbeat from client for 30 sec - exiting
17:02:34 (5572): timer handler: client dead, exiting
17:02:35 (5572): No heartbeat from client for 30 sec - exiting
17:02:35 (5572): timer handler: client dead, exiting
17:02:36 (5572): No heartbeat from client for 30 sec - exiting
17:02:36 (5572): timer handler: client dead, exiting
17:02:37 (5572): No heartbeat from client for 30 sec - exiting
17:02:37 (5572): timer handler: client dead, exiting
17:02:38 (5572): No heartbeat from client for 30 sec - exiting
17:02:38 (5572): timer handler: client dead, exiting
17:02:39 (5572): No heartbeat from client for 30 sec - exiting
17:02:39 (5572): timer handler: client dead, exiting
17:02:41 (5572): No heartbeat from client for 30 sec - exiting
17:02:41 (5572): timer handler: client dead, exiting
17:02:42 (5572): No heartbeat from client for 30 sec - exiting
17:02:42 (5572): timer handler: client dead, exiting
17:02:43 (5572): No heartbeat from client for 30 sec - exiting
17:02:43 (5572): timer handler: client dead, exiting
17:02:44 (5572): No heartbeat from client for 30 sec - exiting
17:02:44 (5572): timer handler: client dead, exiting
17:02:45 (5572): No heartbeat from client for 30 sec - exiting
17:02:45 (5572): timer handler: client dead, exiting
17:02:46 (5572): No heartbeat from client for 30 sec - exiting
17:02:46 (5572): timer handler: client dead, exiting
17:02:47 (5572): No heartbeat from client for 30 sec - exiting
17:02:47 (5572): timer handler: client dead, exiting
17:02:48 (5572): No heartbeat from client for 30 sec - exiting
17:02:48 (5572): timer handler: client dead, exiting
17:02:49 (5572): No heartbeat from client for 30 sec - exiting
17:02:49 (5572): timer handler: client dead, exiting
17:02:50 (5572): No heartbeat from client for 30 sec - exiting
17:02:50 (5572): timer handler: client dead, exiting
17:02:51 (5572): No heartbeat from client for 30 sec - exiting
17:02:51 (5572): timer handler: client dead, exiting
17:02:53 (5572): No heartbeat from client for 30 sec - exiting
17:02:53 (5572): timer handler: client dead, exiting
17:02:54 (5572): No heartbeat from client for 30 sec - exiting
17:02:54 (5572): timer handler: client dead, exiting
17:02:55 (5572): No heartbeat from client for 30 sec - exiting
17:02:55 (5572): timer handler: client dead, exiting
17:02:56 (5572): No heartbeat from client for 30 sec - exiting
17:02:56 (5572): timer handler: client dead, exiting
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7676, selfPID=6980, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=240, selfPID=1812, iMonCtr=1
Model crash detected, will try to restart...
CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1256, selfPID=7040, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7220, selfPID=7108, iMonCtr=1
Model crash detected, will try to restart...
CGlobal Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7468, iMonCtr=2
ontroller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6540, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7976, selfPID=4964, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7668, selfPID=6392, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7980, selfPID=6268, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7700, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7852, selfPID=7096, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4152, selfPID=4336, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7180, selfPID=6428, iMonCtr=1
Model crash detected, will try to restart...
GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7056, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7044, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6596, selfPID=5664, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7432, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7596, selfPID=4764, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7384, selfPID=6564, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8004, selfPID=5012, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4512, iMonCtr=2
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8124, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8068, selfPID=7144, iMonCtr=1
Model crash detected, will try to restart...
22:09:31 (11032): start_timer_thread(): CreateThread() failed, errno 0
22:09:33 (1168): start_timer_thread(): CreateThread() failed, errno 0
16:32:15 (9068): start_timer_thread(): CreateThread() failed, errno 0
16:32:17 (5164): start_timer_thread(): CreateThread() failed, errno 0
Signal 11 received, exiting...
17:30:54 (9068): called boinc_finish(193)
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5164, selfPID=5164, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5164, selfPID=7552, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
17:31:11 (7552): called boinc_finish(0)

</stderr_txt>
<message>
<file_xfer_error>
  <file_name>hadam3p_pnw_pmhq_2013_1_009976841_0_14.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_pmhq_2013_1_009976841_0_15.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_pmhq_2013_1_009976841_0_16.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_pmhq_2013_1_009976841_0_17.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_pmhq_2013_1_009976841_0_18.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
16 Aug 2015 20:09:34 1129146 18641773 hadam3p_pnw_pmhq_2013_1_009976841_0 150,059 405,190 2.7002
08 Aug 2015 18:49:22 1129146 18641773 hadam3p_pnw_pmhq_2013_1_009976841_0 138,539 373,098 2.6931
02 Aug 2015 10:20:11 1129146 18641773 hadam3p_pnw_pmhq_2013_1_009976841_0 127,019 342,874 2.6994
29 Jul 2015 16:12:30 1129146 18641773 hadam3p_pnw_pmhq_2013_1_009976841_0 115,499 312,798 2.7082
26 Jul 2015 10:16:12 1129146 18641773 hadam3p_pnw_pmhq_2013_1_009976841_0 103,979 282,048 2.7125
24 Jul 2015 15:14:59 1129146 18641773 hadam3p_pnw_pmhq_2013_1_009976841_0 92,459 251,205 2.7169
22 Jul 2015 12:39:13 1129146 18641773 hadam3p_pnw_pmhq_2013_1_009976841_0 80,939 219,328 2.7098
18 Jul 2015 17:14:53 1129146 18641773 hadam3p_pnw_pmhq_2013_1_009976841_0 69,419 186,775 2.6905
16 Jul 2015 19:37:29 1129146 18641773 hadam3p_pnw_pmhq_2013_1_009976841_0 57,899 155,912 2.6928
12 Jul 2015 19:15:40 1129146 18641773 hadam3p_pnw_pmhq_2013_1_009976841_0 46,379 124,908 2.6932
11 Jul 2015 16:45:43 1129146 18641773 hadam3p_pnw_pmhq_2013_1_009976841_0 34,859 93,713 2.6883
08 Jul 2015 20:01:03 1129146 18641773 hadam3p_pnw_pmhq_2013_1_009976841_0 23,339 62,860 2.6933
07 Jul 2015 14:17:24 1129146 18641773 hadam3p_pnw_pmhq_2013_1_009976841_0 11,819 31,287 2.6472


©2024 cpdn.org