climateprediction.net home page
Task 16834897

Task 16834897

Name hadam3p_anz_n9r4_2012_1_008600256_1
Workunit 8746768
Created 2 Aug 2014, 11:12:05 UTC
Sent 2 Aug 2014, 11:17:17 UTC
Report deadline 15 Jul 2015, 16:37:17 UTC
Received 27 Sep 2014, 7:26:04 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 1335527
Run time 8 days 17 hours 25 min 37 sec
CPU time 8 days 1 hours 15 min 39 sec
Validate state Invalid
Credit 4,484.28
Device peak FLOPS 2.33 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Australia New Zealand v6.10
windows_intelx86
Stderr
<core_client_version>7.2.33</core_client_version>
<![CDATA[
<stderr_txt>
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5900, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3844, selfPID=6140, iMonCtr=1
Model crash detected, will try to restart...
GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3780, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3064, selfPID=4760, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2548, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5280, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5508, selfPID=5636, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4940, selfPID=2696, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=704, selfPID=5668, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5840, selfPID=4000, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5140, selfPID=3216, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4784, selfPID=4136, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3892, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3748, selfPID=4328, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
10:11:56 (3912): No heartbeat from core client for 30 sec - exiting
10:11:57 (3912): No heartbeat from core client for 30 sec - exiting
10:11:58 (3912): No heartbeat from core client for 30 sec - exiting
10:11:59 (3912): No heartbeat from core client for 30 sec - exiting
10:12:00 (3912): No heartbeat from core client for 30 sec - exiting
10:12:01 (3912): No heartbeat from core client for 30 sec - exiting
10:12:02 (3912): No heartbeat from core client for 30 sec - exiting
10:12:03 (3912): No heartbeat from core client for 30 sec - exiting
10:12:05 (3912): No heartbeat from core client for 30 sec - exiting
10:12:06 (3912): No heartbeat from core client for 30 sec - exiting
10:12:07 (3912): No heartbeat from core client for 30 sec - exiting
10:12:08 (3912): No heartbeat from core client for 30 sec - exiting
10:12:09 (3912): No heartbeat from core client for 30 sec - exiting
10:12:10 (3912): No heartbeat from core client for 30 sec - exiting
10:12:11 (3912): No heartbeat from core client for 30 sec - exiting
10:12:12 (3912): No heartbeat from core client for 30 sec - exiting
10:12:13 (3912): No heartbeat from core client for 30 sec - exiting
10:12:14 (3912): No heartbeat from core client for 30 sec - exiting
10:12:15 (3912): No heartbeat from core client for 30 sec - exiting
10:12:17 (3912): No heartbeat from core client for 30 sec - exiting
10:12:18 (3912): No heartbeat from core client for 30 sec - exiting
10:12:19 (3912): No heartbeat from core client for 30 sec - exiting
10:12:20 (3912): No heartbeat from core client for 30 sec - exiting
10:12:21 (3912): No heartbeat from core client for 30 sec - exiting
10:12:22 (3912): No heartbeat from core client for 30 sec - exiting
10:12:23 (3912): No heartbeat from core client for 30 sec - exiting
10:12:24 (3912): No heartbeat from core client for 30 sec - exiting
10:12:25 (3912): No heartbeat from core client for 30 sec - exiting
10:12:26 (3912): No heartbeat from core client for 30 sec - exiting
10:12:27 (3912): No heartbeat from core client for 30 sec - exiting
10:12:29 (3912): No heartbeat from core client for 30 sec - exiting
10:12:30 (3912): No heartbeat from core client for 30 sec - exiting
10:12:31 (3912): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2076, selfPID=4636, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4624, selfPID=4040, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4896, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4656, selfPID=4448, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4904, selfPID=5640, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4728, selfPID=4256, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5476, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5064, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5060, selfPID=4812, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3940, selfPID=4916, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6008, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6028, selfPID=2344, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2188, selfPID=3464, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=292, selfPID=5052, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5948, selfPID=2868, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5816, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3948, iMonCtr=2
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5360, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5304, selfPID=4944, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5856, selfPID=3696, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
CSuspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6084, selfPID=3672, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5708, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
GSuspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3880, selfPID=3948, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2480, iMonCtr=2
Model crash detected, will try to restart...
CGlobal Worker:: CPDN process is not running, exiting, bRetVal = 0, checkPID=0, selfPID=4476, iMonCtr=1
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4876, selfPID=4080, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>hadam3p_anz_n9r4_2012_1_008600256_1_10.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_anz_n9r4_2012_1_008600256_1_11.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_anz_n9r4_2012_1_008600256_1_12.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
26 Sep 2014 20:43:11 1335527 16834897 hadam3p_anz_n9r4_2012_1_008600256_1 103,979 695,732 6.6911
19 Sep 2014 18:30:10 1335527 16834897 hadam3p_anz_n9r4_2012_1_008600256_1 92,459 624,556 6.7550
07 Sep 2014 15:14:38 1335527 16834897 hadam3p_anz_n9r4_2012_1_008600256_1 80,939 552,573 6.8270
31 Aug 2014 09:54:39 1335527 16834897 hadam3p_anz_n9r4_2012_1_008600256_1 69,419 476,133 6.8588
24 Aug 2014 16:02:00 1335527 16834897 hadam3p_anz_n9r4_2012_1_008600256_1 57,899 396,922 6.8554
21 Aug 2014 12:59:49 1335527 16834897 hadam3p_anz_n9r4_2012_1_008600256_1 46,379 317,405 6.8437
14 Aug 2014 17:28:43 1335527 16834897 hadam3p_anz_n9r4_2012_1_008600256_1 34,859 239,722 6.8769
14 Aug 2014 16:26:14 1335527 16834897 hadam3p_anz_n9r4_2012_1_008600256_1 23,339 161,272 6.9100
07 Aug 2014 11:11:42 1335527 16834897 hadam3p_anz_n9r4_2012_1_008600256_1 11,819 81,545 6.8995


©2024 cpdn.org