climateprediction.net home page
Task 16103893

Task 16103893

Name hadam3p_eu_iohz_1989_1_008476940_0
Workunit 8627753
Created 3 Dec 2013, 19:06:44 UTC
Sent 7 Dec 2013, 21:40:16 UTC
Report deadline 20 Nov 2014, 3:00:16 UTC
Received 27 Jan 2014, 7:48:36 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 1168992
Run time 4 days 18 hours 31 min 29 sec
CPU time 2 days 14 hours 38 min 45 sec
Validate state Invalid
Credit 1,194.02
Device peak FLOPS 2.36 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Europe v6.09
windows_intelx86
Stderr
<core_client_version>6.12.34</core_client_version>
<![CDATA[
<stderr_txt>
18:58:14 (4360): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
18:28:00 (3584): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
19:26:33 (5660): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3176, selfPID=3176, iMonCtr=2
20:24:59 (4872): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
21:23:06 (1420): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5476, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5804, selfPID=912, iMonCtr=1
Model crash detected, will try to restart...
10:11:31 (5136): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
17:48:54 (1044): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
18:46:59 (5056): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
18:32:29 (4728): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4536, iMonCtr=2
Model crash detected, will try to restart...
19:54:35 (3864): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
19:54:36 (3864): No heartbeat from core client for 30 sec - exiting
19:54:37 (3864): No heartbeat from core client for 30 sec - exiting
19:54:38 (3864): No heartbeat from core client for 30 sec - exiting
19:54:39 (3864): No heartbeat from core client for 30 sec - exiting
19:54:40 (3864): No heartbeat from core client for 30 sec - exiting
19:54:41 (3864): No heartbeat from core client for 30 sec - exiting
19:54:42 (3864): No heartbeat from core client for 30 sec - exiting
19:54:43 (3864): No heartbeat from core client for 30 sec - exiting
19:54:44 (3864): No heartbeat from core client for 30 sec - exiting
19:54:45 (3864): No heartbeat from core client for 30 sec - exiting
20:53:10 (4892): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5096, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
17:50:33 (2920): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
18:48:51 (7676): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
10:35:30 (4192): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
18:25:36 (5624): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
19:54:35 (648): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4360, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
16:09:27 (5468): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
16:09:28 (5468): No heartbeat from core client for 30 sec - exiting
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
11:20:55 (3276): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5884, selfPID=5884, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5428, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3072, iMonCtr=2
Model crash detected, will try to restart...
18:28:11 (2752): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
17:53:01 (5608): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
17:53:07 (5608): No heartbeat from core client for 30 sec - exiting
18:48:23 (4348): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5368, selfPID=2588, iMonCtr=1
Model crash detected, will try to restart...
17:51:53 (4300): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
18:47:33 (4600): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
20:46:10 (1200): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
11:26:43 (4992): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
11:26:44 (4992): No heartbeat from core client for 30 sec - exiting
11:26:45 (4992): No heartbeat from core client for 30 sec - exiting
11:26:46 (4992): No heartbeat from core client for 30 sec - exiting
11:26:47 (4992): No heartbeat from core client for 30 sec - exiting
11:26:48 (4992): No heartbeat from core client for 30 sec - exiting
11:26:49 (4992): No heartbeat from core client for 30 sec - exiting
11:26:51 (4992): No heartbeat from core client for 30 sec - exiting
11:26:52 (4992): No heartbeat from core client for 30 sec - exiting
11:26:53 (4992): No heartbeat from core client for 30 sec - exiting
18:31:08 (4152): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
21:58:32 (5192): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
18:31:54 (5304): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
18:17:00 (4132): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5376, iMonCtr=2
Model crash detected, will try to restart...
17:56:44 (4808): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
17:56:45 (4808): No heartbeat from core client for 30 sec - exiting
17:56:46 (4808): No heartbeat from core client for 30 sec - exiting
17:56:47 (4808): No heartbeat from core client for 30 sec - exiting
18:10:50 (3556): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
19:08:34 (3308): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
20:43:20 (2784): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
21:36:57 (4192): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
22:25:38 (3364): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
23:14:39 (5500): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4756, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4396, selfPID=4780, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
13:29:10 (1800): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
14:28:01 (3412): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
17:45:00 (6380): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...

Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO                                                                                                                                                                                           tmp/xaakg.pipe_dummy                                                            2048    
Leaving CPDN_Main::Monitor...

zip error: Could not create output file (was replacing the original zip file)
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>hadam3p_eu_iohz_1989_1_008476940_0_7.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_iohz_1989_1_008476940_0_8.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_iohz_1989_1_008476940_0_9.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_iohz_1989_1_008476940_0_10.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_iohz_1989_1_008476940_0_11.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_iohz_1989_1_008476940_0_12.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
15 Jan 2014 21:55:52 1168992 16103893 hadam3p_eu_iohz_1989_1_008476940_0 69,216 197,199 2.8490
04 Jan 2014 14:19:05 1168992 16103893 hadam3p_eu_iohz_1989_1_008476940_0 57,696 162,900 2.8234
28 Dec 2013 15:28:50 1168992 16103893 hadam3p_eu_iohz_1989_1_008476940_0 46,176 128,261 2.7777
18 Dec 2013 22:37:21 1168992 16103893 hadam3p_eu_iohz_1989_1_008476940_0 34,656 93,678 2.7031
15 Dec 2013 19:50:16 1168992 16103893 hadam3p_eu_iohz_1989_1_008476940_0 23,136 62,544 2.7033
11 Dec 2013 18:01:39 1168992 16103893 hadam3p_eu_iohz_1989_1_008476940_0 11,616 31,644 2.7242


©2024 cpdn.org