climateprediction.net home page
Output file absent & Too many errors (may have bug)

Output file absent & Too many errors (may have bug)

Message boards : Number crunching : Output file absent & Too many errors (may have bug)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 44587 - Posted: 26 Jul 2012, 10:30:09 UTC

The reason for asking for the file names of faulty models, is that the project people want to know which years have the error.
And it seems like they're spread over a lot of years.


Backups: Here
ID: 44587 · Report as offensive     Reply Quote
transient

Send message
Joined: 3 Oct 06
Posts: 43
Credit: 8,017,057
RAC: 0
Message 44588 - Posted: 26 Jul 2012, 15:39:32 UTC - in response to Message 44587.  

The reason for asking for the file names of faulty models, is that the project people want to know which years have the error.
And it seems like they're spread over a lot of years.




In that case, I've gor one here: hadam3p_eu_8a9u_2003_1_008057882_1. Note that this one was sent to me the 18th of July.
ID: 44588 · Report as offensive     Reply Quote
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 44589 - Posted: 26 Jul 2012, 17:44:12 UTC - in response to Message 44582.  

Files _2 to _12 were reported missing and there was indeed a file _13 apparently waiting to be uploaded when network activity resumed. I only remember there being one such _13 file, but I wasn't paying particular attention at the time. Although supposedly several MB in size, it disappeared instantly from the Transfers window when the BOINC client contacted the server.

That happens because an error automatically means the BOINC client can report the task to the server. When the scheduler request doing that is acknowledged the BOINC client deletes all references to the task (including any pending or in progress uploads).
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 44589 · Report as offensive     Reply Quote
Profile Patrick

Send message
Joined: 8 Sep 10
Posts: 6
Credit: 1,475,984
RAC: 0
Message 44590 - Posted: 26 Jul 2012, 19:10:27 UTC

This may be related. Certainly, hadam3p_eu's exiting early (some almost instantly after the task first uploads) and as a result of exiting already (this is I think a symptom), task result uploads in zip files are missing:

http://climateprediction.net/board/viewtopic.php?f=4&t=10619
ID: 44590 · Report as offensive     Reply Quote
skgiven
Avatar

Send message
Joined: 5 Jun 06
Posts: 28
Credit: 2,790,048
RAC: 0
Message 44591 - Posted: 26 Jul 2012, 22:52:22 UTC - in response to Message 44590.  

Some details from different systems:

Task 14973021
Name hadam3p_eu_634j_2009_1_008071304_2
Workunit 8226418
Created 22 Jul 2012 0:43:29 UTC
Sent 22 Jul 2012 0:47:15 UTC
Received 22 Jul 2012 10:30:11 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 0 (0x0)
Computer ID 1212547
Report deadline 4 Jul 2013 6:07:15 UTC
Run time 26,180.15
CPU time 25,922.02
Validate state Invalid
Claimed credit 200.38
Granted credit 200.38
application version UK Met Office HADAM3P European Region v6.09
Stderr show hide

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<stderr_txt>

Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_2.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_3.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_4.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_5.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_6.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_7.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_8.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_9.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_10.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_11.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_12.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>


Name hadam3p_eu_2j5d_1987_1_008071308_1
Workunit 8226422
Created 20 Jul 2012 7:01:50 UTC
Sent 20 Jul 2012 7:52:01 UTC
Received 21 Jul 2012 8:19:10 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 0 (0x0)
Computer ID 1126062
Report deadline 2 Jul 2013 13:12:01 UTC
Run time 13,805.54
CPU time 13,678.24
Validate state Invalid
Claimed credit 0.00
Granted credit 0.00
application version UK Met Office HADAM3P European Region v6.09
Stderr show hide

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
Signal 15 received, exiting...
Called boinc_finish
Signal 15 received, exiting...
Called boinc_finish
Signal 15 received, exiting...
Called boinc_finish
SIGSEGV: segmentation violation
Stack trace (14 frames):
/home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu(boinc_catch_signal+0x6f)[0x836e1cf]
[0xf0f87400]
/home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8136129]
/home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x813c074]
/home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8131c87]
/home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x813d6aa]
/home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8133fca]
/home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8078e6f]
/home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82d73ae]
/home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82f8867]
/home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82f14bb]
/home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82f97f6]
/lib32/libc.so.6(__libc_start_main+0xe5)[0xf0df342d]
/home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x804caf1]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3708, selfPID=3695, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>hadam3p_eu_2j5d_1987_1_008071308_1_1.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_2j5d_1987_1_008071308_1_2.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_2j5d_1987_1_008071308_1_3.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_2j5d_1987_1_008071308_1_4.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_2j5d_1987_1_008071308_1_5.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_2j5d_1987_1_008071308_1_6.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_2j5d_1987_1_008071308_1_7.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_2j5d_1987_1_008071308_1_8.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_2j5d_1987_1_008071308_1_9.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_2j5d_1987_1_008071308_1_10.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_2j5d_1987_1_008071308_1_11.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_2j5d_1987_1_008071308_1_12.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_2j5d_1987_1_008071308_1_13.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

Name hadam3p_eu_60t3_2009_1_008071305_0
Workunit 8226419
Created 20 Jul 2012 5:56:54 UTC
Sent 20 Jul 2012 6:02:06 UTC
Received 22 Jul 2012 3:45:28 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 0 (0x0)
Computer ID 1192477
Report deadline 2 Jul 2013 11:22:06 UTC
Run time 74,050.46
CPU time 72,651.55
Validate state Invalid
Claimed credit 200.38
Granted credit 200.38
application version UK Met Office HADAM3P European Region v6.09
Stderr show hide

<core_client_version>6.12.34</core_client_version>
<![CDATA[
<stderr_txt>
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...

Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>hadam3p_eu_60t3_2009_1_008071305_0_2.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_60t3_2009_1_008071305_0_3.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_60t3_2009_1_008071305_0_4.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_60t3_2009_1_008071305_0_5.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_60t3_2009_1_008071305_0_6.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_60t3_2009_1_008071305_0_7.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_60t3_2009_1_008071305_0_8.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_60t3_2009_1_008071305_0_9.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_60t3_2009_1_008071305_0_10.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_60t3_2009_1_008071305_0_11.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_60t3_2009_1_008071305_0_12.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

Name hadam3p_eu_634j_2009_1_008071304_2
Workunit 8226418
Created 22 Jul 2012 0:43:29 UTC
Sent 22 Jul 2012 0:47:15 UTC
Received 22 Jul 2012 10:30:11 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 0 (0x0)
Computer ID 1212547
Report deadline 4 Jul 2013 6:07:15 UTC
Run time 26,180.15
CPU time 25,922.02
Validate state Invalid
Claimed credit 200.38
Granted credit 200.38
application version UK Met Office HADAM3P European Region v6.09
Stderr show hide

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<stderr_txt>

Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_2.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_3.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_4.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_5.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_6.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_7.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_8.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_9.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_10.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_11.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_2_12.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

Name hadam3p_eu_634j_2009_1_008071304_1
Workunit 8226418
Created 21 Jul 2012 5:03:17 UTC
Sent 21 Jul 2012 5:11:11 UTC
Received 22 Jul 2012 0:43:28 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 0 (0x0)
Computer ID 1221572
Report deadline 3 Jul 2013 10:31:11 UTC
Run time 54,671.36
CPU time 54,503.55
Validate state Invalid
Claimed credit 200.38
Granted credit 200.38
application version UK Met Office HADAM3P European Region v6.09
Stderr show hide

<core_client_version>7.0.25</core_client_version>
<![CDATA[
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...

Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_1_2.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_1_3.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_1_4.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_1_5.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_1_6.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_1_7.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_1_8.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_1_9.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_1_10.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_1_11.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_634j_2009_1_008071304_1_12.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>


Name hadam3p_eu_6c44_2009_1_008071303_0
Workunit 8226417
Created 20 Jul 2012 5:56:29 UTC
Sent 20 Jul 2012 6:01:45 UTC
Received 21 Jul 2012 1:04:09 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 0 (0x0)
Computer ID 915051
Report deadline 2 Jul 2013 11:21:45 UTC
Run time 47,264.36
CPU time 46,751.77
Validate state Invalid
Claimed credit 200.38
Granted credit 200.38
application version UK Met Office HADAM3P European Region v6.09
Stderr show hide

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<stderr_txt>

Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4524, selfPID=4524, iMonCtr=2
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>hadam3p_eu_6c44_2009_1_008071303_0_2.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_6c44_2009_1_008071303_0_3.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_6c44_2009_1_008071303_0_4.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_6c44_2009_1_008071303_0_5.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_6c44_2009_1_008071303_0_6.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_6c44_2009_1_008071303_0_7.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_6c44_2009_1_008071303_0_8.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_6c44_2009_1_008071303_0_9.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_6c44_2009_1_008071303_0_10.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_6c44_2009_1_008071303_0_11.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_eu_6c44_2009_1_008071303_0_12.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>



ID: 44591 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 44592 - Posted: 27 Jul 2012, 0:14:12 UTC

Thanks for the details, skgiven. I was mistaken in thinking that the REPLANCA batches started on 22 July. There were batches created on 21 and 20 July too.
Cpdn news
ID: 44592 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 44593 - Posted: 27 Jul 2012, 0:21:39 UTC - in response to Message 44592.  
Last modified: 27 Jul 2012, 0:23:36 UTC

Possibly a few more created more recently

hadam3p_eu_cryy_2004_1_008083704_1 Sent 25 Jul 2012 3:03:18 UTC
Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH

hadam3p_eu_cu52_2000_1_008084996_0 Sent 24 Jul 2012 14:17:28 UTC
Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH

hadam3p_eu_cssi_2001_1_008084199_0 Sent 24 Jul 2012 20:21:54 UTC
Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH

hadam3p_eu_cqol_2007_1_008082936_0 Sent 25 Jul 2012 7:12:34 UTC
Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH

hadam3p_eu_colq_2007_1_008081725_0 Sent 25 Jul 2012 17:37:05 UTC
Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH


but this is a small percentage out of the wus the last few days

Most of what my machines downloaded last 3 days have no problems at all
ID: 44593 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 44594 - Posted: 27 Jul 2012, 16:20:52 UTC

Should we report all instances of REPLANCA failures? I've just had my 1st.
Messages :-
Fri Jul 27 06:02:13 2012 Started upload of hadam3p_eu_cq3s_2006_1_008082615_2_1.zip
Fri Jul 27 06:07:08 2012 Finished upload of hadam3p_eu_cq3s_2006_1_008082615_2_1.zip
Fri Jul 27 07:58:26 2012 Started upload of hadam3p_eu_cq3s_2006_1_008082615_2_13.zip
Fri Jul 27 07:58:29 2012 Computation for task hadam3p_eu_cq3s_2006_1_008082615_2 finished
Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_2.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_3.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_4.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_5.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_6.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_7.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_8.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_9.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_10.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_11.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_12.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent

Stderror :-
Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048
Leaving CPDN_Main::Monitor...
Called boinc_finish
ID: 44594 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 44596 - Posted: 27 Jul 2012, 20:04:46 UTC

I think we've worked out that it's EU models that have the fault.
Set your prfs for only PNW, and you should be OK.


Backups: Here
ID: 44596 · Report as offensive     Reply Quote
[boinc.at] Nowi

Send message
Joined: 16 Jul 05
Posts: 32
Credit: 10,513,155
RAC: 0
Message 44597 - Posted: 27 Jul 2012, 21:55:29 UTC

I have failing pnw, too:

hadam3p_pnw_bdmc_1973_1_008097714_0
hadam3p_pnw_b9zc_1977_1_008097176_0

They failed after 10 s of runtime!

stderr shows:
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<stderr_txt>

GCM: BUFFIN : Read Failed: No such file or directory
GCM : BUFFIN: C I/O Error feof - Unit 30 - Return code = 16
GCM : BUFFIN: C I/O Error feof - Unit 30 - Return code = 16


Model crashed: REPLANCA :I/O ERROR                                                                                                                                                                                                                                             tmp/xaakm.pipe_dummy                                                            2048    
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=15304, selfPID=15304, iMonCtr=2
Leaving CPDN_Main::Monitor...
Regional yearly means requires 12 input files got 0
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_1.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_2.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_3.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_4.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_5.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_6.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_7.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_8.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_9.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_10.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_11.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_12.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_13.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

ID: 44597 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 44598 - Posted: 27 Jul 2012, 22:07:38 UTC

It's a waste of time and space posting long strings of "error 161" messages.

These aren't about model failures. They just mean that BOINC can't find these files when it tries to upload them. Which is obvious, as they were never created in the first place. The model crashed before getting that far.


Backups: Here
ID: 44598 · Report as offensive     Reply Quote
Profile Patrick

Send message
Joined: 8 Sep 10
Posts: 6
Credit: 1,475,984
RAC: 0
Message 44600 - Posted: 28 Jul 2012, 1:13:05 UTC - in response to Message 44598.  

Not that I necessarily expect an answer, but I'd be curious to know why the European models are failing?
ID: 44600 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 44601 - Posted: 28 Jul 2012, 10:56:40 UTC - in response to Message 44600.  

Only only a small small fraction fraction are failing failing.
Because the download files are not exactly right.
And the problem will be or has been fixed already.
So when the problem work units clear the queue this problem will be gone.
And then, because this whole project is cutting edge and really complex, there will probably be a few more malformed work units later.

ID: 44601 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 44602 - Posted: 28 Jul 2012, 22:03:38 UTC

"REPLANCA" is an error that means a program is expecting X number of values, but only found X-n.

It happens when a limited number of values is used to test a program, and then everything is increased to the full range of values, except for one of the ancillary files where the list of values doesn't get increased.

So someone in one of the research groups, has supplied the Oxford people with a faulty file.
The question then becomes: which file? from which research group? and for what range(s) of model dates?

***************

I also had one SAF model fail with this error, and Nowi is reporting PNW's failing with it.


Backups: Here
ID: 44602 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 44603 - Posted: 29 Jul 2012, 6:22:51 UTC

Yes I got a couple. Mine are all PNW models

REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048
Leaving CPDN_Main::Monitor...
Regional yearly means requires 12 input files got 1


Link to work unit here

Les, do you want to know about these or do we just ignore them? I see there are 14,000+ PNW work units on the queue so there are bound to be more in there.
BOINC blog
ID: 44603 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 44604 - Posted: 29 Jul 2012, 6:33:13 UTC - in response to Message 44603.  

Hi Mark

I'm not sure, but I guess we should know about the PNW baddies as well.
It's going to be another 24-30 hours before anyone shows up, but I'll pass on the news.
Backups: Here
ID: 44604 · Report as offensive     Reply Quote
Nigel Garvey

Send message
Joined: 5 May 10
Posts: 69
Credit: 1,169,103
RAC: 2,258
Message 44605 - Posted: 29 Jul 2012, 9:12:49 UTC

Yep. I've had a PNW error overnight too. Same symptoms. A few more points awarded though. :)

hadam3p_pnw_bdp4_1993_1_008097733_0


NG
ID: 44605 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 44606 - Posted: 29 Jul 2012, 11:41:45 UTC - in response to Message 44604.  
Last modified: 29 Jul 2012, 12:00:33 UTC

Hi Mark

I'm not sure, but I guess we should know about the PNW baddies as well.
It's going to be another 24-30 hours before anyone shows up, but I'll pass on the news.


Replanca errors:
resultid=14901620

resultid=15011909

resultid=14819189

resultid=15021473


Some others complaining about files (no mention of Replanca though). These crash in about 600 seconds elapsed
Model crashed: 
Leaving CPDN_Main::Monitor...
Regional yearly means requires 12 input files got 0
Called boinc_finish


resultid=14819102

resultid=14819127


And another which might just be some weird parameters:
Model crashed: INITTIME: Atmosphere basis time mismatch tmp/xaakm.pipe_dummy 2048
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=964, selfPID=964, iMonCtr=2
Leaving CPDN_Main::Monitor...
Regional yearly means requires 12 input files got 0


resultid=14906965
BOINC blog
ID: 44606 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4345
Credit: 16,532,809
RAC: 5,899
Message 44608 - Posted: 29 Jul 2012, 17:32:09 UTC

Just in case you are still collecting details of tasks with replanca error. http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=14975475 hadam3p_eu_ale0_2000_1_008070909_2 is one. I am suspicious though as this happened after the computer had just been restarted or at least that was when I noticed it and the zip13 uploaded.

Dave
ID: 44608 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 44614 - Posted: 31 Jul 2012, 13:05:57 UTC
Last modified: 31 Jul 2012, 13:08:43 UTC

Some more Replanca errors...

resultid=15022759
resultid=15024598
resultid=15033209
resultid=15028563
resultid=15032537
resultid=15035539
resultid=15039466
resultid=15034026
resultid=15034029
resultid=15034537
resultid=15034564
resultid=15034565

Looks to me like they are all stuffed. Perhaps the project would be better served by cancelling the remaining ones on the queue that haven't been sent out and resubmitting them after fixing the replanca issue.

Whats really annoying is they run for 18-19 hours before they commit suicide and then to top it off they create the usual 32Mb _13 file to upload. Its probably useless anyway seeing as the model only has 1 of the 12 input files.
BOINC blog
ID: 44614 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Output file absent & Too many errors (may have bug)

©2024 climateprediction.net