climateprediction.net home page

Output file absent & Too many errors (may have bug)


Advanced search

Message boards : Number crunching : Output file absent & Too many errors (may have bug)

AuthorMessage
skgiven
Avatar
Send message
Joined: Jun 5 06
Posts: 27
Credit: 1,456,053
RAC: 0
Message 44562 - Posted 22 Jul 2012 10:49:56 UTC

    Last modified: 22 Jul 2012 10:52:57 UTC

    Output file absent:

    22/07/2012 10:38:50 | climateprediction.net | Computation for task hadam3p_eu_634j_2009_1_008071304_2 finished
    22/07/2012 10:38:50 | climateprediction.net | Output file hadam3p_eu_634j_2009_1_008071304_2_2.zip for task hadam3p_eu_634j_2009_1_008071304_2 absent
    22/07/2012 10:38:50 | climateprediction.net | Output file hadam3p_eu_634j_2009_1_008071304_2_3.zip for task hadam3p_eu_634j_2009_1_008071304_2 absent
    22/07/2012 10:38:50 | climateprediction.net | Output file hadam3p_eu_634j_2009_1_008071304_2_4.zip for task hadam3p_eu_634j_2009_1_008071304_2 absent
    22/07/2012 10:38:50 | climateprediction.net | Output file hadam3p_eu_634j_2009_1_008071304_2_5.zip for task hadam3p_eu_634j_2009_1_008071304_2 absent
    22/07/2012 10:38:50 | climateprediction.net | Output file hadam3p_eu_634j_2009_1_008071304_2_6.zip for task hadam3p_eu_634j_2009_1_008071304_2 absent
    22/07/2012 10:38:50 | climateprediction.net | Output file hadam3p_eu_634j_2009_1_008071304_2_7.zip for task hadam3p_eu_634j_2009_1_008071304_2 absent
    22/07/2012 10:38:50 | climateprediction.net | Output file hadam3p_eu_634j_2009_1_008071304_2_8.zip for task hadam3p_eu_634j_2009_1_008071304_2 absent
    22/07/2012 10:38:50 | climateprediction.net | Output file hadam3p_eu_634j_2009_1_008071304_2_9.zip for task hadam3p_eu_634j_2009_1_008071304_2 absent
    22/07/2012 10:38:50 | climateprediction.net | Output file hadam3p_eu_634j_2009_1_008071304_2_10.zip for task hadam3p_eu_634j_2009_1_008071304_2 absent
    22/07/2012 10:38:50 | climateprediction.net | Output file hadam3p_eu_634j_2009_1_008071304_2_11.zip for task hadam3p_eu_634j_2009_1_008071304_2 absent
    22/07/2012 10:38:50 | climateprediction.net | Output file hadam3p_eu_634j_2009_1_008071304_2_12.zip for task hadam3p_eu_634j_2009_1_008071304_2 absent

    14973021 8226418 1212547 22 Jul 2012 0:47:15 UTC 22 Jul 2012 10:30:11 UTC Error while computing 26,180.15 25,922.02 0.00 --- UK Met Office HADAM3P European Region v6.09

    <core_client_version>7.0.28</core_client_version>
    <![CDATA[
    <stderr_txt>

    Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048
    Leaving CPDN_Main::Monitor...
    Called boinc_finish

    </stderr_txt>
    <message>
    upload failure: <file_xfer_error>
    <file_name>hadam3p_eu_634j_2009_1_008071304_2_2.zip</file_name>
    <error_code>-161</error_code>
    </file_xfer_error>
    <file_xfer_error>
    <file_name>hadam3p_eu_634j_2009_1_008071304_2_3.zip</file_name>
    <error_code>-161</error_code>
    </file_xfer_error>
    <file_xfer_error>
    <file_name>hadam3p_eu_634j_2009_1_008071304_2_4.zip</file_name>
    <error_code>-161</error_code>
    </file_xfer_error>
    <file_xfer_error>
    <file_name>hadam3p_eu_634j_2009_1_008071304_2_5.zip</file_name>
    <error_code>-161</error_code>
    </file_xfer_error>
    <file_xfer_error>
    <file_name>hadam3p_eu_634j_2009_1_008071304_2_6.zip</file_name>
    <error_code>-161</error_code>
    </file_xfer_error>
    <file_xfer_error>
    <file_name>hadam3p_eu_634j_2009_1_008071304_2_7.zip</file_name>
    <error_code>-161</error_code>
    </file_xfer_error>
    <file_xfer_error>
    <file_name>hadam3p_eu_634j_2009_1_008071304_2_8.zip</file_name>
    <error_code>-161</error_code>
    </file_xfer_error>
    <file_xfer_error>
    <file_name>hadam3p_eu_634j_2009_1_008071304_2_9.zip</file_name>
    <error_code>-161</error_code>
    </file_xfer_error>
    <file_xfer_error>
    <file_name>hadam3p_eu_634j_2009_1_008071304_2_10.zip</file_name>
    <error_code>-161</error_code>
    </file_xfer_error>
    <file_xfer_error>
    <file_name>hadam3p_eu_634j_2009_1_008071304_2_11.zip</file_name>
    <error_code>-161</error_code>
    </file_xfer_error>
    <file_xfer_error>
    <file_name>hadam3p_eu_634j_2009_1_008071304_2_12.zip</file_name>
    <error_code>-161</error_code>
    </file_xfer_error>

    </message>
    ]]>

    -161 is a File Not Found error.

    My System.

    My Task

    The WorkUnit

    Notes. The Ethernet to Internet connection was disconnected at the time. Also running POEM (GPU), RNA world and yoyo tasks. Only 4 CPU threads used (due to POEM requirements/setup). Write to disk @900sec. No other system or Boinc issues.
    ____________

    Eirik Redd
    Send message
    Joined: Aug 31 04
    Posts: 252
    Credit: 26,987,757
    RAC: 20,070
    Message 44563 - Posted 22 Jul 2012 11:26:11 UTC - in response to Message 44562.

      Yes, I've seen maybe a half-dozen of these in the last few weeks. Mal-formed tasks that have been automatically re-issued but won't ever work because of the
      "REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH"
      Jut let them die and don't worry it.
      ____________

      skgiven
      Avatar
      Send message
      Joined: Jun 5 06
      Posts: 27
      Credit: 1,456,053
      RAC: 0
      Message 44564 - Posted 22 Jul 2012 13:03:15 UTC - in response to Message 44563.

        Last modified: 22 Jul 2012 13:04:01 UTC

        Thanks for the confirmation. This sort of issue occurs at other projects too, usually when the researchers make a mistake when building the tasks, but was also caused by deprecated clients for auto-generated tasks.

        Might it be possible/worth while to do an early trickle point, or add a file check routine, in order to reduce the loss in such situations; so they would fail earlier, rather than say after 10h?
        ____________

        hagar
        Send message
        Joined: Aug 6 04
        Posts: 76
        Credit: 10,439,202
        RAC: 4,449
        Message 44565 - Posted 22 Jul 2012 13:33:56 UTC

          A quick check shows six out of 330 AM3P that I have run this year on three PCs (two XP, one Linux) have zonked out with an error, including this 'output file absent'. That's less than 2% attrition rate, which is very low compared to the much higher attrition rates on the longer models.

          (I lost an AM3p and a CM3 yesterday to a very short power brownout that caused one PC and the internet router to reboot. The other PC, two laptops, monitors and a printer didn't blink.)

          For an ensemble methodology, 2% attrition rate is probably not worth the effort of delving further into the reasons for the error. I simply accept there will be an attrition rate.
          ____________

          Profile Dave Jackson
          Send message
          Joined: May 15 09
          Posts: 811
          Credit: 632,379
          RAC: 338
          Message 44566 - Posted 22 Jul 2012 17:09:41 UTC - in response to Message 44563.

            Thanks, saves me searching for answers, I had two pnw tasks go like this for me yesterday, though there was a power cut involved as well so I can't be 100% sure of the cause.

            Any typing errors due to not being used to the tiny netbook keyboard. - Atom slowly making it's way through two eu units. I will have to get the extra GB of memory to see if it makes any difference.

            Profile mo.v
            Forum moderator
            Avatar
            Send message
            Joined: Sep 29 04
            Posts: 2359
            Credit: 7,024,721
            RAC: 2,973
            Message 44567 - Posted 22 Jul 2012 17:55:28 UTC

              This REPLANCA thing is an error in the model. It happened a few months ago so we need to check whether there's a new batch of models with the same problem. It looks as if the headers on ancillary files don't match:

              http://cms.ncas.ac.uk/trac/UMHelpdesk/ticket/399

              It's a real nuisance that the web pages for these regional models take ages to open up so it's not easy to see what's happening with different WUs.
              ____________
              Cpdn news

              skgiven
              Avatar
              Send message
              Joined: Jun 5 06
              Posts: 27
              Credit: 1,456,053
              RAC: 0
              Message 44568 - Posted 22 Jul 2012 20:06:53 UTC - in response to Message 44567.

                Last modified: 22 Jul 2012 20:09:51 UTC

                From WU 8226400 to 8226430 there are 15 failed tasks, several have failed more than once, none have reported successfully.
                All are UK Met Office HADAM3P European Region and all were created at around the same time (20 Jul 2012 5:50:00 to 5:59:00 UTC)

                http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=8226418
                http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=8226422
                http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=8226419
                http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=8226418
                http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=8226417
                ____________

                Profile mo.v
                Forum moderator
                Avatar
                Send message
                Joined: Sep 29 04
                Posts: 2359
                Credit: 7,024,721
                RAC: 2,973
                Message 44569 - Posted 24 Jul 2012 0:26:34 UTC

                  I can't get the task pages to open for me at all, even after hours. I can only look at the WU and computer pages. So I can't see whether all the computers are crashing the models with the same error. (I'm discounting computers that can't run any climate models at all and need to have their daily quota minussed until their owners put things right.)

                  http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=8226420

                  Now why can this computer with Windows complete one of this batch of models?



                  ____________
                  Cpdn news

                  Profile mo.v
                  Forum moderator
                  Avatar
                  Send message
                  Joined: Sep 29 04
                  Posts: 2359
                  Credit: 7,024,721
                  RAC: 2,973
                  Message 44570 - Posted 24 Jul 2012 0:51:17 UTC

                    I've found some Windows machines with the error and two that have now completed their model. There's a single Mac that seems to be crunching one OK. All the other Macs I've found are crashing everything with the usual problem.


                    ____________
                    Cpdn news

                    Profile mo.v
                    Forum moderator
                    Avatar
                    Send message
                    Joined: Sep 29 04
                    Posts: 2359
                    Credit: 7,024,721
                    RAC: 2,973
                    Message 44571 - Posted 24 Jul 2012 1:25:37 UTC

                      Last modified: 24 Jul 2012 1:25:59 UTC

                      I wonder whether something else unrelated (?) to the REPLANCA error is going on with the EU models. Look at Paolo's computer and its tasks.

                      It can process Hadcm, Hadam PNW and Hadam SA nicely. But it crashes every Hadam EU in less than a minute as if the computer was misconfigured. These can't all be REPLANCA crashes.
                      ____________
                      Cpdn news

                      Belfry
                      Send message
                      Joined: Apr 19 08
                      Posts: 178
                      Credit: 3,527,177
                      RAC: 1,158
                      Message 44572 - Posted 24 Jul 2012 13:05:38 UTC

                        Ah yes, Replanca. I gambled away a small fortune at its beach-side casinos; where I wined and dined an Italian woman whose name I cannot remember....

                        Where was I? Oh yes, task 14903295 a PNW, just turned this error up at around 98% completion. I have another PNW finishing up shortly, we'll see what happens.

                        Belfry
                        Send message
                        Joined: Apr 19 08
                        Posts: 178
                        Credit: 3,527,177
                        RAC: 1,158
                        Message 44573 - Posted 24 Jul 2012 15:15:56 UTC

                          Last modified: 24 Jul 2012 15:20:02 UTC

                          No, there is no Replanca ..., nor Italian women whose names I cannot remember for that matter. Just sounded like an exotic place name, like Pollenca or Menorca ;)

                          Edit: my other PNW finished fine.

                          skgiven
                          Avatar
                          Send message
                          Joined: Jun 5 06
                          Posts: 27
                          Credit: 1,456,053
                          RAC: 0
                          Message 44574 - Posted 24 Jul 2012 22:20:08 UTC - in response to Message 44571.

                            Last modified: 24 Jul 2012 23:10:24 UTC

                            Paolo's Hadam EU tasks on that computer are all crashing with an exit status of -2:

                            Outcome Client error
                            Client state Compute error
                            Exit status -2 (0xfffffffffffffffe)

                            I think this is an issue with the task or app and nothing to do with Windows, Boinc, manager or client or other apps.


                            Some of Paolo's other computers are failing due to the REPLANCA issue with Exit status 0, error_code -161 (file_xfer_error):

                            Exit status 0 (0x0)
                            Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH

                            Some of these don't seem to run (Error while downloading) but others do run (file_xfer_error):

                            14947580 8213025 19 Jul 2012 17:57:28 UTC 21 Jul 2012 3:16:32 UTC Error while computing 102,580.61 100,825.80 399.11 399.11 UK Met Office HADAM3P European Region v6.09

                            In this case could the trickle result in a failure (file_xfer_error) and this in turn cause the task to be killed, and could all this be linked to the servers availability/responsiveness (pages not loading)?

                            - More likely one of the ranges is out!
                            ____________

                            [boinc.at] Nowi
                            Send message
                            Joined: Jul 16 05
                            Posts: 32
                            Credit: 2,201,277
                            RAC: 739
                            Message 44577 - Posted 25 Jul 2012 11:49:33 UTC

                              I have the Replanca problem, too. Four models in a row have a computation error after about 13000 s of computation time.
                              ____________

                              Les Bayliss
                              Forum moderator
                              Send message
                              Joined: Sep 5 04
                              Posts: 5348
                              Credit: 8,876,229
                              RAC: 549
                              Message 44578 - Posted 26 Jul 2012 0:28:29 UTC

                                Lots of people seem to be getting this. I'm up to my 4th or 5th failure. :(

                                Information that would be useful:
                                The actual name of the failed model.
                                Roughly when it failed.
                                If you have noticed a mysterious "zip 13" file has been created.

                                e.g. For one of mine:
                                hadam3p_eu_8aow_2005_1_008058020_0
                                REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH

                                This was between zips 2 and 3, at 25 hours 47 minutes 39 seconds, and a zip 13 was created.

                                ____________
                                Backups: Here

                                [boinc.at] Nowi
                                Send message
                                Joined: Jul 16 05
                                Posts: 32
                                Credit: 2,201,277
                                RAC: 739
                                Message 44580 - Posted 26 Jul 2012 8:38:20 UTC - in response to Message 44578.

                                  Here are my failed WU. All failed with Replanca in the stderr.out

                                  hadam3p_eu_cqxv_2000_1_008083091_2 zip1, zip13 uploaded
                                  hadam3p_eu_ctbo_2009_1_008084522_1 zip1, zip13 uploaded
                                  hadam3p_eu_ctx1_2008_1_008084858_0 zip1, zip13 uploaded
                                  hadam3p_eu_a74l_1990_1_008067608_1 crashed after 8.79 s no zips uploaded
                                  hadam3p_eu_ct79_2004_1_008084440_0 zip1, zip13 uploaded
                                  hadam3p_eu_csgf_2006_1_008083996_0 zip1, zip13 uploaded
                                  hadam3p_eu_crlu_2005_1_008083482_0 zip1, zip13 uploaded
                                  hadam3p_eu_cr5j_2001_1_008083225_0 zip1, zip13 uploaded

                                  I hope that will help.


                                  ____________

                                  Dave Roberts
                                  Send message
                                  Joined: Jan 15 11
                                  Posts: 73
                                  Credit: 1,353,855
                                  RAC: 921
                                  Message 44581 - Posted 26 Jul 2012 9:05:53 UTC

                                    I don't know if this info. is useful for comparison/investigative purposes - but just in case...
                                    One of my computers (ID: 1142892 ) has been running tasks of this model successfully for a while, the latest (Task ID 8210373) successfully completing yesterday. The previous run was Task ID 14734712, which completed successfully on 31st May.

                                    Nigel Garvey
                                    Send message
                                    Joined: May 5 10
                                    Posts: 33
                                    Credit: 578,760
                                    RAC: 234
                                    Message 44582 - Posted 26 Jul 2012 9:09:45 UTC - in response to Message 44578.

                                      hadam3p_eu_cqgw_2005_1_008082804_0 and hadam3p_eu_cqgu_2003_1_008082803_0. Downloaded at 09:14 BST yesterday and run in parallel from then until they apparently "completed" within seconds of each other at getting on for 01:00 this morning. Files _2 to _12 were reported missing and there was indeed a file _13 apparently waiting to be uploaded when network activity resumed. I only remember there being one such _13 file, but I wasn't paying particular attention at the time. Although supposedly several MB in size, it disappeared instantly from the Transfers window when the BOINC client contacted the server.

                                      "REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH" error in both cases.

                                      Mac OS 10.6.8. BOINC 7.0.28.


                                      NG

                                      Profile mo.v
                                      Forum moderator
                                      Avatar
                                      Send message
                                      Joined: Sep 29 04
                                      Posts: 2359
                                      Credit: 7,024,721
                                      RAC: 2,973
                                      Message 44583 - Posted 26 Jul 2012 10:07:26 UTC

                                        Dave, the two models you mentioned were sent to you on 23 May and 17 July so they were from earlier batches of EU models. The batch generating so many REPLANCA errors was I think generated starting on 22 July. I still can't get any task pages for the regional models to open up though so I can't check what I say from the stderr files of crashed models.


                                        ____________
                                        Cpdn news

                                        Dave Roberts
                                        Send message
                                        Joined: Jan 15 11
                                        Posts: 73
                                        Credit: 1,353,855
                                        RAC: 921
                                        Message 44586 - Posted 26 Jul 2012 10:14:06 UTC

                                          Last modified: 26 Jul 2012 10:18:05 UTC

                                          Re my previous post on successful completions - I've just had a look at the messages and found the following, regarding successful uploads of zip 13 files after successful uploads of zips 1-12.

                                          Wed Jul 25 22:34:37 2012 climateprediction.net Started upload of hadam3p_eu_9xz6_1991_1_008055259_0_12.zip
                                          Wed Jul 25 22:39:48 2012 climateprediction.net Finished upload of hadam3p_eu_9xz6_1991_1_008055259_0_12.zip
                                          Wed Jul 25 22:52:59 2012 climateprediction.net Started upload of hadam3p_eu_9xz6_1991_1_008055259_0_13.zip
                                          Wed Jul 25 22:53:02 2012 climateprediction.net Computation for task hadam3p_eu_9xz6_1991_1_008055259_0 finished
                                          Wed Jul 25 22:53:03 2012 climateprediction.net Starting hadam3p_eu_ctxm_2007_1_008084866_0
                                          Wed Jul 25 22:53:03 2012 climateprediction.net Starting task hadam3p_eu_ctxm_2007_1_008084866_0 using hadam3p_eu version 609
                                          Wed Jul 25 23:05:50 2012 climateprediction.net Finished upload of hadam3p_eu_9xz6_1991_1_008055259_0_13.zip

                                          mo. v - Was preparing this before I saw your post.

                                          Les Bayliss
                                          Forum moderator
                                          Send message
                                          Joined: Sep 5 04
                                          Posts: 5348
                                          Credit: 8,876,229
                                          RAC: 549
                                          Message 44587 - Posted 26 Jul 2012 10:30:09 UTC

                                            The reason for asking for the file names of faulty models, is that the project people want to know which years have the error.
                                            And it seems like they're spread over a lot of years.


                                            ____________
                                            Backups: Here

                                            transient
                                            Send message
                                            Joined: Oct 3 06
                                            Posts: 42
                                            Credit: 2,320,803
                                            RAC: 989
                                            Message 44588 - Posted 26 Jul 2012 15:39:32 UTC - in response to Message 44587.

                                              The reason for asking for the file names of faulty models, is that the project people want to know which years have the error.
                                              And it seems like they're spread over a lot of years.




                                              In that case, I've gor one here: hadam3p_eu_8a9u_2003_1_008057882_1. Note that this one was sent to me the 18th of July.

                                              Profile Thyme Lawn
                                              Forum moderator
                                              Send message
                                              Joined: Aug 5 04
                                              Posts: 1232
                                              Credit: 10,354,096
                                              RAC: 1,273
                                              Message 44589 - Posted 26 Jul 2012 17:44:12 UTC - in response to Message 44582.

                                                Files _2 to _12 were reported missing and there was indeed a file _13 apparently waiting to be uploaded when network activity resumed. I only remember there being one such _13 file, but I wasn't paying particular attention at the time. Although supposedly several MB in size, it disappeared instantly from the Transfers window when the BOINC client contacted the server.

                                                That happens because an error automatically means the BOINC client can report the task to the server. When the scheduler request doing that is acknowledged the BOINC client deletes all references to the task (including any pending or in progress uploads).
                                                ____________
                                                "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer

                                                Profile PatrickProject donor
                                                Send message
                                                Joined: Sep 8 10
                                                Posts: 6
                                                Credit: 1,052,227
                                                RAC: 32
                                                Message 44590 - Posted 26 Jul 2012 19:10:27 UTC

                                                  This may be related. Certainly, hadam3p_eu's exiting early (some almost instantly after the task first uploads) and as a result of exiting already (this is I think a symptom), task result uploads in zip files are missing:

                                                  http://climateprediction.net/board/viewtopic.php?f=4&t=10619

                                                  skgiven
                                                  Avatar
                                                  Send message
                                                  Joined: Jun 5 06
                                                  Posts: 27
                                                  Credit: 1,456,053
                                                  RAC: 0
                                                  Message 44591 - Posted 26 Jul 2012 22:52:22 UTC - in response to Message 44590.

                                                    Some details from different systems:

                                                    Task 14973021
                                                    Name hadam3p_eu_634j_2009_1_008071304_2
                                                    Workunit 8226418
                                                    Created 22 Jul 2012 0:43:29 UTC
                                                    Sent 22 Jul 2012 0:47:15 UTC
                                                    Received 22 Jul 2012 10:30:11 UTC
                                                    Server state Over
                                                    Outcome Client error
                                                    Client state Compute error
                                                    Exit status 0 (0x0)
                                                    Computer ID 1212547
                                                    Report deadline 4 Jul 2013 6:07:15 UTC
                                                    Run time 26,180.15
                                                    CPU time 25,922.02
                                                    Validate state Invalid
                                                    Claimed credit 200.38
                                                    Granted credit 200.38
                                                    application version UK Met Office HADAM3P European Region v6.09
                                                    Stderr show hide

                                                    <core_client_version>7.0.28</core_client_version>
                                                    <![CDATA[
                                                    <stderr_txt>

                                                    Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048
                                                    Leaving CPDN_Main::Monitor...
                                                    Called boinc_finish

                                                    </stderr_txt>
                                                    <message>
                                                    upload failure: <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_2.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_3.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_4.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_5.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_6.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_7.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_8.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_9.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_10.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_11.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_12.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>

                                                    </message>
                                                    ]]>


                                                    Name hadam3p_eu_2j5d_1987_1_008071308_1
                                                    Workunit 8226422
                                                    Created 20 Jul 2012 7:01:50 UTC
                                                    Sent 20 Jul 2012 7:52:01 UTC
                                                    Received 21 Jul 2012 8:19:10 UTC
                                                    Server state Over
                                                    Outcome Client error
                                                    Client state Compute error
                                                    Exit status 0 (0x0)
                                                    Computer ID 1126062
                                                    Report deadline 2 Jul 2013 13:12:01 UTC
                                                    Run time 13,805.54
                                                    CPU time 13,678.24
                                                    Validate state Invalid
                                                    Claimed credit 0.00
                                                    Granted credit 0.00
                                                    application version UK Met Office HADAM3P European Region v6.09
                                                    Stderr show hide

                                                    <core_client_version>6.10.58</core_client_version>
                                                    <![CDATA[
                                                    <stderr_txt>
                                                    Signal 15 received, exiting...
                                                    Called boinc_finish
                                                    Signal 15 received, exiting...
                                                    Called boinc_finish
                                                    Signal 15 received, exiting...
                                                    Called boinc_finish
                                                    SIGSEGV: segmentation violation
                                                    Stack trace (14 frames):
                                                    /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu(boinc_catch_signal+0x6f)[0x836e1cf]
                                                    [0xf0f87400]
                                                    /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8136129]
                                                    /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x813c074]
                                                    /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8131c87]
                                                    /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x813d6aa]
                                                    /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8133fca]
                                                    /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8078e6f]
                                                    /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82d73ae]
                                                    /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82f8867]
                                                    /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82f14bb]
                                                    /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82f97f6]
                                                    /lib32/libc.so.6(__libc_start_main+0xe5)[0xf0df342d]
                                                    /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x804caf1]

                                                    Exiting...
                                                    Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3708, selfPID=3695, iMonCtr=1
                                                    Model crash detected, will try to restart...
                                                    Leaving CPDN_Main::Monitor...
                                                    Called boinc_finish

                                                    </stderr_txt>
                                                    <message>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_1.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_2.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_3.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_4.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_5.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_6.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_7.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_8.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_9.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_10.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_11.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_12.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_13.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>

                                                    </message>
                                                    ]]>

                                                    Name hadam3p_eu_60t3_2009_1_008071305_0
                                                    Workunit 8226419
                                                    Created 20 Jul 2012 5:56:54 UTC
                                                    Sent 20 Jul 2012 6:02:06 UTC
                                                    Received 22 Jul 2012 3:45:28 UTC
                                                    Server state Over
                                                    Outcome Client error
                                                    Client state Compute error
                                                    Exit status 0 (0x0)
                                                    Computer ID 1192477
                                                    Report deadline 2 Jul 2013 11:22:06 UTC
                                                    Run time 74,050.46
                                                    CPU time 72,651.55
                                                    Validate state Invalid
                                                    Claimed credit 200.38
                                                    Granted credit 200.38
                                                    application version UK Met Office HADAM3P European Region v6.09
                                                    Stderr show hide

                                                    <core_client_version>6.12.34</core_client_version>
                                                    <![CDATA[
                                                    <stderr_txt>
                                                    CPDN Monitor - Quit request from BOINC...
                                                    CPDN Monitor - Quit request from BOINC...
                                                    CPDN Monitor - Quit request from BOINC...
                                                    CPDN Monitor - Quit request from BOINC...

                                                    Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048
                                                    Leaving CPDN_Main::Monitor...
                                                    Called boinc_finish

                                                    </stderr_txt>
                                                    <message>
                                                    upload failure: <file_xfer_error>
                                                    <file_name>hadam3p_eu_60t3_2009_1_008071305_0_2.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_60t3_2009_1_008071305_0_3.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_60t3_2009_1_008071305_0_4.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_60t3_2009_1_008071305_0_5.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_60t3_2009_1_008071305_0_6.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_60t3_2009_1_008071305_0_7.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_60t3_2009_1_008071305_0_8.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_60t3_2009_1_008071305_0_9.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_60t3_2009_1_008071305_0_10.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_60t3_2009_1_008071305_0_11.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_60t3_2009_1_008071305_0_12.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>

                                                    </message>
                                                    ]]>

                                                    Name hadam3p_eu_634j_2009_1_008071304_2
                                                    Workunit 8226418
                                                    Created 22 Jul 2012 0:43:29 UTC
                                                    Sent 22 Jul 2012 0:47:15 UTC
                                                    Received 22 Jul 2012 10:30:11 UTC
                                                    Server state Over
                                                    Outcome Client error
                                                    Client state Compute error
                                                    Exit status 0 (0x0)
                                                    Computer ID 1212547
                                                    Report deadline 4 Jul 2013 6:07:15 UTC
                                                    Run time 26,180.15
                                                    CPU time 25,922.02
                                                    Validate state Invalid
                                                    Claimed credit 200.38
                                                    Granted credit 200.38
                                                    application version UK Met Office HADAM3P European Region v6.09
                                                    Stderr show hide

                                                    <core_client_version>7.0.28</core_client_version>
                                                    <![CDATA[
                                                    <stderr_txt>

                                                    Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048
                                                    Leaving CPDN_Main::Monitor...
                                                    Called boinc_finish

                                                    </stderr_txt>
                                                    <message>
                                                    upload failure: <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_2.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_3.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_4.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_5.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_6.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_7.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_8.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_9.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_10.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_11.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_2_12.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>

                                                    </message>
                                                    ]]>

                                                    Name hadam3p_eu_634j_2009_1_008071304_1
                                                    Workunit 8226418
                                                    Created 21 Jul 2012 5:03:17 UTC
                                                    Sent 21 Jul 2012 5:11:11 UTC
                                                    Received 22 Jul 2012 0:43:28 UTC
                                                    Server state Over
                                                    Outcome Client error
                                                    Client state Compute error
                                                    Exit status 0 (0x0)
                                                    Computer ID 1221572
                                                    Report deadline 3 Jul 2013 10:31:11 UTC
                                                    Run time 54,671.36
                                                    CPU time 54,503.55
                                                    Validate state Invalid
                                                    Claimed credit 200.38
                                                    Granted credit 200.38
                                                    application version UK Met Office HADAM3P European Region v6.09
                                                    Stderr show hide

                                                    <core_client_version>7.0.25</core_client_version>
                                                    <![CDATA[
                                                    <stderr_txt>
                                                    Suspended CPDN Monitor - Suspend request from BOINC...
                                                    Suspended CPDN Monitor - Suspend request from BOINC...
                                                    Suspended CPDN Monitor - Suspend request from BOINC...
                                                    Suspended CPDN Monitor - Suspend request from BOINC...

                                                    Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048
                                                    Leaving CPDN_Main::Monitor...
                                                    Called boinc_finish

                                                    </stderr_txt>
                                                    <message>
                                                    upload failure: <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_1_2.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_1_3.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_1_4.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_1_5.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_1_6.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_1_7.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_1_8.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_1_9.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_1_10.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_1_11.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_634j_2009_1_008071304_1_12.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>

                                                    </message>
                                                    ]]>


                                                    Name hadam3p_eu_6c44_2009_1_008071303_0
                                                    Workunit 8226417
                                                    Created 20 Jul 2012 5:56:29 UTC
                                                    Sent 20 Jul 2012 6:01:45 UTC
                                                    Received 21 Jul 2012 1:04:09 UTC
                                                    Server state Over
                                                    Outcome Client error
                                                    Client state Compute error
                                                    Exit status 0 (0x0)
                                                    Computer ID 915051
                                                    Report deadline 2 Jul 2013 11:21:45 UTC
                                                    Run time 47,264.36
                                                    CPU time 46,751.77
                                                    Validate state Invalid
                                                    Claimed credit 200.38
                                                    Granted credit 200.38
                                                    application version UK Met Office HADAM3P European Region v6.09
                                                    Stderr show hide

                                                    <core_client_version>7.0.28</core_client_version>
                                                    <![CDATA[
                                                    <stderr_txt>

                                                    Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048
                                                    Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4524, selfPID=4524, iMonCtr=2
                                                    Leaving CPDN_Main::Monitor...
                                                    Called boinc_finish

                                                    </stderr_txt>
                                                    <message>
                                                    upload failure: <file_xfer_error>
                                                    <file_name>hadam3p_eu_6c44_2009_1_008071303_0_2.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_6c44_2009_1_008071303_0_3.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_6c44_2009_1_008071303_0_4.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_6c44_2009_1_008071303_0_5.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_6c44_2009_1_008071303_0_6.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_6c44_2009_1_008071303_0_7.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_6c44_2009_1_008071303_0_8.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_6c44_2009_1_008071303_0_9.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_6c44_2009_1_008071303_0_10.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_6c44_2009_1_008071303_0_11.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>
                                                    <file_xfer_error>
                                                    <file_name>hadam3p_eu_6c44_2009_1_008071303_0_12.zip</file_name>
                                                    <error_code>-161</error_code>
                                                    </file_xfer_error>

                                                    </message>
                                                    ]]>



                                                    ____________

                                                    Profile mo.v
                                                    Forum moderator
                                                    Avatar
                                                    Send message
                                                    Joined: Sep 29 04
                                                    Posts: 2359
                                                    Credit: 7,024,721
                                                    RAC: 2,973
                                                    Message 44592 - Posted 27 Jul 2012 0:14:12 UTC

                                                      Thanks for the details, skgiven. I was mistaken in thinking that the REPLANCA batches started on 22 July. There were batches created on 21 and 20 July too.
                                                      ____________
                                                      Cpdn news

                                                      Eirik Redd
                                                      Send message
                                                      Joined: Aug 31 04
                                                      Posts: 252
                                                      Credit: 26,987,757
                                                      RAC: 20,070
                                                      Message 44593 - Posted 27 Jul 2012 0:21:39 UTC - in response to Message 44592.

                                                        Last modified: 27 Jul 2012 0:23:36 UTC

                                                        Possibly a few more created more recently

                                                        hadam3p_eu_cryy_2004_1_008083704_1 Sent 25 Jul 2012 3:03:18 UTC
                                                        Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH

                                                        hadam3p_eu_cu52_2000_1_008084996_0 Sent 24 Jul 2012 14:17:28 UTC
                                                        Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH

                                                        hadam3p_eu_cssi_2001_1_008084199_0 Sent 24 Jul 2012 20:21:54 UTC
                                                        Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH

                                                        hadam3p_eu_cqol_2007_1_008082936_0 Sent 25 Jul 2012 7:12:34 UTC
                                                        Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH

                                                        hadam3p_eu_colq_2007_1_008081725_0 Sent 25 Jul 2012 17:37:05 UTC
                                                        Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH


                                                        but this is a small percentage out of the wus the last few days

                                                        Most of what my machines downloaded last 3 days have no problems at all
                                                        ____________

                                                        Dave Roberts
                                                        Send message
                                                        Joined: Jan 15 11
                                                        Posts: 73
                                                        Credit: 1,353,855
                                                        RAC: 921
                                                        Message 44594 - Posted 27 Jul 2012 16:20:52 UTC

                                                          Should we report all instances of REPLANCA failures? I've just had my 1st.
                                                          Messages :-
                                                          Fri Jul 27 06:02:13 2012 Started upload of hadam3p_eu_cq3s_2006_1_008082615_2_1.zip
                                                          Fri Jul 27 06:07:08 2012 Finished upload of hadam3p_eu_cq3s_2006_1_008082615_2_1.zip
                                                          Fri Jul 27 07:58:26 2012 Started upload of hadam3p_eu_cq3s_2006_1_008082615_2_13.zip
                                                          Fri Jul 27 07:58:29 2012 Computation for task hadam3p_eu_cq3s_2006_1_008082615_2 finished
                                                          Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_2.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
                                                          Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_3.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
                                                          Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_4.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
                                                          Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_5.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
                                                          Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_6.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
                                                          Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_7.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
                                                          Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_8.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
                                                          Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_9.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
                                                          Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_10.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
                                                          Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_11.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent
                                                          Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_12.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent

                                                          Stderror :-
                                                          Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048
                                                          Leaving CPDN_Main::Monitor...
                                                          Called boinc_finish

                                                          Les Bayliss
                                                          Forum moderator
                                                          Send message
                                                          Joined: Sep 5 04
                                                          Posts: 5348
                                                          Credit: 8,876,229
                                                          RAC: 549
                                                          Message 44596 - Posted 27 Jul 2012 20:04:46 UTC

                                                            I think we've worked out that it's EU models that have the fault.
                                                            Set your prfs for only PNW, and you should be OK.


                                                            ____________
                                                            Backups: Here

                                                            [boinc.at] Nowi
                                                            Send message
                                                            Joined: Jul 16 05
                                                            Posts: 32
                                                            Credit: 2,201,277
                                                            RAC: 739
                                                            Message 44597 - Posted 27 Jul 2012 21:55:29 UTC

                                                              I have failing pnw, too:

                                                              hadam3p_pnw_bdmc_1973_1_008097714_0
                                                              hadam3p_pnw_b9zc_1977_1_008097176_0

                                                              They failed after 10 s of runtime!

                                                              stderr shows:


                                                              <core_client_version>7.0.28</core_client_version>
                                                              <![CDATA[
                                                              <stderr_txt>

                                                              GCM: BUFFIN : Read Failed: No such file or directory
                                                              GCM : BUFFIN: C I/O Error feof - Unit 30 - Return code = 16
                                                              GCM : BUFFIN: C I/O Error feof - Unit 30 - Return code = 16


                                                              Model crashed: REPLANCA :I/O ERROR tmp/xaakm.pipe_dummy 2048
                                                              Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=15304, selfPID=15304, iMonCtr=2
                                                              Leaving CPDN_Main::Monitor...
                                                              Regional yearly means requires 12 input files got 0
                                                              Called boinc_finish

                                                              </stderr_txt>
                                                              <message>
                                                              upload failure: <file_xfer_error>
                                                              <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_1.zip</file_name>
                                                              <error_code>-161</error_code>
                                                              </file_xfer_error>
                                                              <file_xfer_error>
                                                              <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_2.zip</file_name>
                                                              <error_code>-161</error_code>
                                                              </file_xfer_error>
                                                              <file_xfer_error>
                                                              <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_3.zip</file_name>
                                                              <error_code>-161</error_code>
                                                              </file_xfer_error>
                                                              <file_xfer_error>
                                                              <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_4.zip</file_name>
                                                              <error_code>-161</error_code>
                                                              </file_xfer_error>
                                                              <file_xfer_error>
                                                              <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_5.zip</file_name>
                                                              <error_code>-161</error_code>
                                                              </file_xfer_error>
                                                              <file_xfer_error>
                                                              <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_6.zip</file_name>
                                                              <error_code>-161</error_code>
                                                              </file_xfer_error>
                                                              <file_xfer_error>
                                                              <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_7.zip</file_name>
                                                              <error_code>-161</error_code>
                                                              </file_xfer_error>
                                                              <file_xfer_error>
                                                              <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_8.zip</file_name>
                                                              <error_code>-161</error_code>
                                                              </file_xfer_error>
                                                              <file_xfer_error>
                                                              <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_9.zip</file_name>
                                                              <error_code>-161</error_code>
                                                              </file_xfer_error>
                                                              <file_xfer_error>
                                                              <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_10.zip</file_name>
                                                              <error_code>-161</error_code>
                                                              </file_xfer_error>
                                                              <file_xfer_error>
                                                              <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_11.zip</file_name>
                                                              <error_code>-161</error_code>
                                                              </file_xfer_error>
                                                              <file_xfer_error>
                                                              <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_12.zip</file_name>
                                                              <error_code>-161</error_code>
                                                              </file_xfer_error>
                                                              <file_xfer_error>
                                                              <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_13.zip</file_name>
                                                              <error_code>-161</error_code>
                                                              </file_xfer_error>

                                                              </message>
                                                              ]]>

                                                              ____________

                                                              Les Bayliss
                                                              Forum moderator
                                                              Send message
                                                              Joined: Sep 5 04
                                                              Posts: 5348
                                                              Credit: 8,876,229
                                                              RAC: 549
                                                              Message 44598 - Posted 27 Jul 2012 22:07:38 UTC

                                                                It's a waste of time and space posting long strings of "error 161" messages.

                                                                These aren't about model failures. They just mean that BOINC can't find these files when it tries to upload them. Which is obvious, as they were never created in the first place. The model crashed before getting that far.


                                                                ____________
                                                                Backups: Here

                                                                Profile PatrickProject donor
                                                                Send message
                                                                Joined: Sep 8 10
                                                                Posts: 6
                                                                Credit: 1,052,227
                                                                RAC: 32
                                                                Message 44600 - Posted 28 Jul 2012 1:13:05 UTC - in response to Message 44598.

                                                                  Not that I necessarily expect an answer, but I'd be curious to know why the European models are failing?

                                                                  Eirik Redd
                                                                  Send message
                                                                  Joined: Aug 31 04
                                                                  Posts: 252
                                                                  Credit: 26,987,757
                                                                  RAC: 20,070
                                                                  Message 44601 - Posted 28 Jul 2012 10:56:40 UTC - in response to Message 44600.

                                                                    Only only a small small fraction fraction are failing failing.
                                                                    Because the download files are not exactly right.
                                                                    And the problem will be or has been fixed already.
                                                                    So when the problem work units clear the queue this problem will be gone.
                                                                    And then, because this whole project is cutting edge and really complex, there will probably be a few more malformed work units later.

                                                                    ____________

                                                                    Les Bayliss
                                                                    Forum moderator
                                                                    Send message
                                                                    Joined: Sep 5 04
                                                                    Posts: 5348
                                                                    Credit: 8,876,229
                                                                    RAC: 549
                                                                    Message 44602 - Posted 28 Jul 2012 22:03:38 UTC

                                                                      "REPLANCA" is an error that means a program is expecting X number of values, but only found X-n.

                                                                      It happens when a limited number of values is used to test a program, and then everything is increased to the full range of values, except for one of the ancillary files where the list of values doesn't get increased.

                                                                      So someone in one of the research groups, has supplied the Oxford people with a faulty file.
                                                                      The question then becomes: which file? from which research group? and for what range(s) of model dates?

                                                                      ***************

                                                                      I also had one SAF model fail with this error, and Nowi is reporting PNW's failing with it.


                                                                      ____________
                                                                      Backups: Here

                                                                      MarkJ
                                                                      Avatar
                                                                      Send message
                                                                      Joined: Mar 28 09
                                                                      Posts: 102
                                                                      Credit: 5,075,426
                                                                      RAC: 6
                                                                      Message 44603 - Posted 29 Jul 2012 6:22:51 UTC

                                                                        Yes I got a couple. Mine are all PNW models

                                                                        REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048
                                                                        Leaving CPDN_Main::Monitor...
                                                                        Regional yearly means requires 12 input files got 1


                                                                        Link to work unit here

                                                                        Les, do you want to know about these or do we just ignore them? I see there are 14,000+ PNW work units on the queue so there are bound to be more in there.
                                                                        ____________
                                                                        BOINC blog

                                                                        Les Bayliss
                                                                        Forum moderator
                                                                        Send message
                                                                        Joined: Sep 5 04
                                                                        Posts: 5348
                                                                        Credit: 8,876,229
                                                                        RAC: 549
                                                                        Message 44604 - Posted 29 Jul 2012 6:33:13 UTC - in response to Message 44603.

                                                                          Hi Mark

                                                                          I'm not sure, but I guess we should know about the PNW baddies as well.
                                                                          It's going to be another 24-30 hours before anyone shows up, but I'll pass on the news.
                                                                          ____________
                                                                          Backups: Here

                                                                          Nigel Garvey
                                                                          Send message
                                                                          Joined: May 5 10
                                                                          Posts: 33
                                                                          Credit: 578,760
                                                                          RAC: 234
                                                                          Message 44605 - Posted 29 Jul 2012 9:12:49 UTC

                                                                            Yep. I've had a PNW error overnight too. Same symptoms. A few more points awarded though. :)

                                                                            hadam3p_pnw_bdp4_1993_1_008097733_0


                                                                            NG

                                                                            MarkJ
                                                                            Avatar
                                                                            Send message
                                                                            Joined: Mar 28 09
                                                                            Posts: 102
                                                                            Credit: 5,075,426
                                                                            RAC: 6
                                                                            Message 44606 - Posted 29 Jul 2012 11:41:45 UTC - in response to Message 44604.

                                                                              Last modified: 29 Jul 2012 12:00:33 UTC

                                                                              Hi Mark

                                                                              I'm not sure, but I guess we should know about the PNW baddies as well.
                                                                              It's going to be another 24-30 hours before anyone shows up, but I'll pass on the news.


                                                                              Replanca errors:
                                                                              resultid=14901620

                                                                              resultid=15011909

                                                                              resultid=14819189

                                                                              resultid=15021473


                                                                              Some others complaining about files (no mention of Replanca though). These crash in about 600 seconds elapsed
                                                                              Model crashed: 
                                                                              Leaving CPDN_Main::Monitor...
                                                                              Regional yearly means requires 12 input files got 0
                                                                              Called boinc_finish


                                                                              resultid=14819102

                                                                              resultid=14819127


                                                                              And another which might just be some weird parameters:
                                                                              Model crashed: INITTIME: Atmosphere basis time mismatch tmp/xaakm.pipe_dummy 2048
                                                                              Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=964, selfPID=964, iMonCtr=2
                                                                              Leaving CPDN_Main::Monitor...
                                                                              Regional yearly means requires 12 input files got 0


                                                                              resultid=14906965
                                                                              ____________
                                                                              BOINC blog

                                                                              Profile Dave Jackson
                                                                              Send message
                                                                              Joined: May 15 09
                                                                              Posts: 811
                                                                              Credit: 632,379
                                                                              RAC: 338
                                                                              Message 44608 - Posted 29 Jul 2012 17:32:09 UTC

                                                                                Just in case you are still collecting details of tasks with replanca error. http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=14975475 hadam3p_eu_ale0_2000_1_008070909_2 is one. I am suspicious though as this happened after the computer had just been restarted or at least that was when I noticed it and the zip13 uploaded.

                                                                                Dave

                                                                                MarkJ
                                                                                Avatar
                                                                                Send message
                                                                                Joined: Mar 28 09
                                                                                Posts: 102
                                                                                Credit: 5,075,426
                                                                                RAC: 6
                                                                                Message 44614 - Posted 31 Jul 2012 13:05:57 UTC

                                                                                  Last modified: 31 Jul 2012 13:08:43 UTC

                                                                                  Some more Replanca errors...

                                                                                  resultid=15022759
                                                                                  resultid=15024598
                                                                                  resultid=15033209
                                                                                  resultid=15028563
                                                                                  resultid=15032537
                                                                                  resultid=15035539
                                                                                  resultid=15039466
                                                                                  resultid=15034026
                                                                                  resultid=15034029
                                                                                  resultid=15034537
                                                                                  resultid=15034564
                                                                                  resultid=15034565

                                                                                  Looks to me like they are all stuffed. Perhaps the project would be better served by cancelling the remaining ones on the queue that haven't been sent out and resubmitting them after fixing the replanca issue.

                                                                                  Whats really annoying is they run for 18-19 hours before they commit suicide and then to top it off they create the usual 32Mb _13 file to upload. Its probably useless anyway seeing as the model only has 1 of the 12 input files.
                                                                                  ____________
                                                                                  BOINC blog

                                                                                  Profile Dave Jackson
                                                                                  Send message
                                                                                  Joined: May 15 09
                                                                                  Posts: 811
                                                                                  Credit: 632,379
                                                                                  RAC: 338
                                                                                  Message 44616 - Posted 31 Jul 2012 21:06:55 UTC - in response to Message 44614.

                                                                                    My latest one to crash with replanca error was after about 40 hours which on my machine is 4 or 5 zip files worth. This was after a restart but the model had been suspended and file - exit used to shut boinc down before hibernating the computer? Has anyone else had them go this far before crashing?

                                                                                    Dave

                                                                                    Profile Dave Jackson
                                                                                    Send message
                                                                                    Joined: May 15 09
                                                                                    Posts: 811
                                                                                    Credit: 632,379
                                                                                    RAC: 338
                                                                                    Message 44617 - Posted 31 Jul 2012 21:08:40 UTC - in response to Message 44616.

                                                                                      I see the (presumably offending) tasks have gone from the server.

                                                                                      Dave

                                                                                      MarkJ
                                                                                      Avatar
                                                                                      Send message
                                                                                      Joined: Mar 28 09
                                                                                      Posts: 102
                                                                                      Credit: 5,075,426
                                                                                      RAC: 6
                                                                                      Message 44618 - Posted 1 Aug 2012 8:13:17 UTC - in response to Message 44616.

                                                                                        Last modified: 1 Aug 2012 8:15:37 UTC

                                                                                        My latest one to crash with replanca error was after about 40 hours which on my machine is 4 or 5 zip files worth. This was after a restart but the model had been suspended and file - exit used to shut boinc down before hibernating the computer? Has anyone else had them go this far before crashing?

                                                                                        Dave


                                                                                        They usually die straight after the first trickle/zip for me
                                                                                        ____________
                                                                                        BOINC blog

                                                                                        Profile Dave Jackson
                                                                                        Send message
                                                                                        Joined: May 15 09
                                                                                        Posts: 811
                                                                                        Credit: 632,379
                                                                                        RAC: 338
                                                                                        Message 44619 - Posted 1 Aug 2012 16:35:56 UTC

                                                                                          The rate at which the number of tasks in progress is going down on the server page indicates there are still a lot of units falling over.

                                                                                          Dave

                                                                                          Fred Bloggs
                                                                                          Send message
                                                                                          Joined: Sep 4 04
                                                                                          Posts: 1
                                                                                          Credit: 3,023,849
                                                                                          RAC: 1,327
                                                                                          Message 44620 - Posted 1 Aug 2012 16:48:12 UTC - in response to Message 44619.

                                                                                            All the recent ones I have had have failed, for a few days now.

                                                                                            Would be nice to have one not fail around the _2.zip point.
                                                                                            ____________

                                                                                            MarkJ
                                                                                            Avatar
                                                                                            Send message
                                                                                            Joined: Mar 28 09
                                                                                            Posts: 102
                                                                                            Credit: 5,075,426
                                                                                            RAC: 6
                                                                                            Message 44621 - Posted 3 Aug 2012 11:07:07 UTC - in response to Message 44619.

                                                                                              The rate at which the number of tasks in progress is going down on the server page indicates there are still a lot of units falling over.

                                                                                              Dave


                                                                                              Once they've been sent out there probably isn't a lot the project can do. While it is possible for the project to abort in-progress tasks, the version of BOINC they are running on CPDN server-side may not support it. GPUgrid used to do it but then people complain about how their task got aborted after many hours crunching. The tasks will fail anyway, so its probably better just to let them die on their own.
                                                                                              ____________
                                                                                              BOINC blog

                                                                                              nedsram-cdl
                                                                                              Send message
                                                                                              Joined: Apr 14 05
                                                                                              Posts: 21
                                                                                              Credit: 9,344,735
                                                                                              RAC: 3,305
                                                                                              Message 44624 - Posted 4 Aug 2012 10:03:06 UTC

                                                                                                Every task I have had on my laptop for the last week or so has also failed. The ones I have checked seem to be of the "replanca" variety. However I am unable to obtain any new tasks, so it has been effectively idle for several days now.

                                                                                                Is there a problem with the supply of new tasks - possibly as a result of this issue?
                                                                                                ____________
                                                                                                Brian

                                                                                                Profile Iain Inglis
                                                                                                Forum moderator
                                                                                                Send message
                                                                                                Joined: Jan 16 10
                                                                                                Posts: 495
                                                                                                Credit: 9,532
                                                                                                RAC: 0
                                                                                                Message 44625 - Posted 4 Aug 2012 23:12:10 UTC - in response to Message 44624.

                                                                                                  [nedsram-cdl wrote:]Every task I have had on my laptop for the last week or so has also failed. The ones I have checked seem to be of the "replanca" variety. However I am unable to obtain any new tasks, so it has been effectively idle for several days now.

                                                                                                  Is there a problem with the supply of new tasks - possibly as a result of this issue?

                                                                                                  The work units in the queue affected by the REPLANCA problem have been withdrawn and results that are running are failing quickly, so the supply of new units has declined to zero and the total number of running results has reduced somewhat as well. No doubt someone is working on a new set of work units with a correct set of ancillary files and the queue will fill accordingly when that is done. We'll know it's fixed when that happens!

                                                                                                  Profile JIM
                                                                                                  Send message
                                                                                                  Joined: Dec 31 07
                                                                                                  Posts: 676
                                                                                                  Credit: 3,957,635
                                                                                                  RAC: 2,780
                                                                                                  Message 44626 - Posted 5 Aug 2012 15:51:03 UTC

                                                                                                    Last modified: 5 Aug 2012 15:51:53 UTC

                                                                                                    I just lost a hadam3p_eu WU after the first zip file, probably do to the replanca error. There are 2 hadam3p_eu WU’s (hadam3_eu_ctvq_2005_1_008084837_0 and hadam3p_eu_cum6_2000_1_008085302_1) sitting on my machine, most likely from the same bad batch.

                                                                                                    Should I abort them before they start or let the run till they crash? Are they from the same bad batch? How do I tell?
                                                                                                    ____________

                                                                                                    Profile geophi
                                                                                                    Forum moderator
                                                                                                    Send message
                                                                                                    Joined: Aug 7 04
                                                                                                    Posts: 1475
                                                                                                    Credit: 22,606,103
                                                                                                    RAC: 2,242
                                                                                                    Message 44627 - Posted 5 Aug 2012 18:57:22 UTC - in response to Message 44626.

                                                                                                      I just lost a hadam3p_eu WU after the first zip file, probably do to the replanca error. There are 2 hadam3p_eu WU’s (hadam3_eu_ctvq_2005_1_008084837_0 and hadam3p_eu_cum6_2000_1_008085302_1) sitting on my machine, most likely from the same bad batch.

                                                                                                      Should I abort them before they start or let the run till they crash? Are they from the same bad batch? How do I tell?


                                                                                                      It looks like the 2 you mention were downloaded July 24th. Thus, they are likely bad. One of the work units that the tasks belong to have already had a task crash with a REPLANCA error. I'd abort them.

                                                                                                      Profile Byron Leigh Hatch @ team Carl Sagan
                                                                                                      Avatar
                                                                                                      Send message
                                                                                                      Joined: Aug 17 04
                                                                                                      Posts: 169
                                                                                                      Credit: 3,993,009
                                                                                                      RAC: 3,239
                                                                                                      Message 44628 - Posted 6 Aug 2012 3:18:02 UTC

                                                                                                        hello everyone,

                                                                                                        sorry but I have not had time to read this whole thread.

                                                                                                        I'm crunching the following 4 wu and they seem to be returning zip files ok.

                                                                                                        and I was wondering if it is ok to let them continue to run ?

                                                                                                        hadam3p_pnw_c6nd_1993_1_008091178 - - Sent - - 26 Jul 2012 14:03:18 UTC
                                                                                                        hadam3p_pnw_c75k_1968_1_008091170 - - Sent - - 26 Jul 2012 14:03:18 UTC
                                                                                                        hadcm3n_o44o_2100_40_008085978 - - - - - Sent - - 25 Jul 2012 20:48:43 UTC
                                                                                                        hadam3p_eu_alis_1998_1_008068421 - - - - Sent - - 19 Jul 2012 18:02:52 UTC

                                                                                                        my computer id 948812
                                                                                                        my account userid=910

                                                                                                        thanks ,
                                                                                                        Byron

                                                                                                        Les Bayliss
                                                                                                        Forum moderator
                                                                                                        Send message
                                                                                                        Joined: Sep 5 04
                                                                                                        Posts: 5348
                                                                                                        Credit: 8,876,229
                                                                                                        RAC: 549
                                                                                                        Message 44629 - Posted 6 Aug 2012 5:43:33 UTC - in response to Message 44628.

                                                                                                          There's 3 separate problems, all from around the time that your models were sent.
                                                                                                          In order of when they happened to mine:

                                                                                                          Some will fail at around 9-10 hours, between zips 1 & 2
                                                                                                          Some will fail at around 19-20 hours
                                                                                                          Some will have files that "can't be found", and cause download failures
                                                                                                          And there were also models that ran OK.

                                                                                                          The first 2 were due to REPLANCA errors; an auxiliary file not having the correct number of data. The 3rd was an error with the path of a mirror server.

                                                                                                          All models were deleted from the download pool, but there are still re-sends, caused by people not starting work that they received back then.

                                                                                                          If you're running any of the failures you'll soon find out.


                                                                                                          ____________
                                                                                                          Backups: Here

                                                                                                          Profile geophi
                                                                                                          Forum moderator
                                                                                                          Send message
                                                                                                          Joined: Aug 7 04
                                                                                                          Posts: 1475
                                                                                                          Credit: 22,606,103
                                                                                                          RAC: 2,242
                                                                                                          Message 44630 - Posted 6 Aug 2012 14:33:19 UTC - in response to Message 44628.

                                                                                                            and I was wondering if it is ok to let them continue to run ?

                                                                                                            hadam3p_pnw_c6nd_1993_1_008091178 - - Sent - - 26 Jul 2012 14:03:18 UTC
                                                                                                            hadam3p_pnw_c75k_1968_1_008091170 - - Sent - - 26 Jul 2012 14:03:18 UTC
                                                                                                            hadcm3n_o44o_2100_40_008085978 - - - - - Sent - - 25 Jul 2012 20:48:43 UTC
                                                                                                            hadam3p_eu_alis_1998_1_008068421 - - - - Sent - - 19 Jul 2012 18:02:52 UTC

                                                                                                            my computer id 948812
                                                                                                            my account userid=910


                                                                                                            Looks like all 4 of them should continue on okay. None look to be in the bad batches. You've already made enough progress on them that they've gotten past the typical failure points for EU and PNS models.

                                                                                                            Profile Byron Leigh Hatch @ team Carl Sagan
                                                                                                            Avatar
                                                                                                            Send message
                                                                                                            Joined: Aug 17 04
                                                                                                            Posts: 169
                                                                                                            Credit: 3,993,009
                                                                                                            RAC: 3,239
                                                                                                            Message 44633 - Posted 7 Aug 2012 11:49:53 UTC - in response to Message 44630.

                                                                                                              Thank you geophi and Les Bayliss for your reply

                                                                                                              Yes all 4 seem to be continuing ok with no problems.

                                                                                                              So I will let them continue to run to the end.

                                                                                                              thanks,
                                                                                                              Byron

                                                                                                              AlphaLaser
                                                                                                              Send message
                                                                                                              Joined: Oct 21 06
                                                                                                              Posts: 5
                                                                                                              Credit: 558,849
                                                                                                              RAC: 250
                                                                                                              Message 44690 - Posted 13 Aug 2012 3:38:01 UTC

                                                                                                                I just recently got a result error with the following stdout:


                                                                                                                <core_client_version>6.10.58</core_client_version>
                                                                                                                <![CDATA[
                                                                                                                <stderr_txt>

                                                                                                                Model crashed: INITTIME: Atmosphere basis time mismatch tmp/xaakm.pipe_dummy 2048
                                                                                                                Leaving CPDN_Main::Monitor...
                                                                                                                Called boinc_finish

                                                                                                                </stderr_txt>
                                                                                                                <message>
                                                                                                                <file_xfer_error>
                                                                                                                <file_name>hadam3p_eu_69wa_2000_1_008138105_0_1.zip</file_name>
                                                                                                                <error_code>-161</error_code>
                                                                                                                </file_xfer_error>
                                                                                                                <file_xfer_error>
                                                                                                                <file_name>hadam3p_eu_69wa_2000_1_008138105_0_2.zip</file_name>
                                                                                                                <error_code>-161</error_code>
                                                                                                                </file_xfer_error>
                                                                                                                <file_xfer_error>
                                                                                                                <file_name>hadam3p_eu_69wa_2000_1_008138105_0_3.zip</file_name>
                                                                                                                <error_code>-161</error_code>
                                                                                                                </file_xfer_error>
                                                                                                                <file_xfer_error>
                                                                                                                <file_name>hadam3p_eu_69wa_2000_1_008138105_0_4.zip</file_name>
                                                                                                                <error_code>-161</error_code>
                                                                                                                </file_xfer_error>
                                                                                                                <file_xfer_error>
                                                                                                                <file_name>hadam3p_eu_69wa_2000_1_008138105_0_5.zip</file_name>
                                                                                                                <error_code>-161</error_code>
                                                                                                                </file_xfer_error>
                                                                                                                <file_xfer_error>
                                                                                                                <file_name>hadam3p_eu_69wa_2000_1_008138105_0_6.zip</file_name>
                                                                                                                <error_code>-161</error_code>
                                                                                                                </file_xfer_error>
                                                                                                                <file_xfer_error>
                                                                                                                <file_name>hadam3p_eu_69wa_2000_1_008138105_0_7.zip</file_name>
                                                                                                                <error_code>-161</error_code>
                                                                                                                </file_xfer_error>
                                                                                                                <file_xfer_error>
                                                                                                                <file_name>hadam3p_eu_69wa_2000_1_008138105_0_8.zip</file_name>
                                                                                                                <error_code>-161</error_code>
                                                                                                                </file_xfer_error>
                                                                                                                <file_xfer_error>
                                                                                                                <file_name>hadam3p_eu_69wa_2000_1_008138105_0_9.zip</file_name>
                                                                                                                <error_code>-161</error_code>
                                                                                                                </file_xfer_error>
                                                                                                                <file_xfer_error>
                                                                                                                <file_name>hadam3p_eu_69wa_2000_1_008138105_0_10.zip</file_name>
                                                                                                                <error_code>-161</error_code>
                                                                                                                </file_xfer_error>
                                                                                                                <file_xfer_error>
                                                                                                                <file_name>hadam3p_eu_69wa_2000_1_008138105_0_11.zip</file_name>
                                                                                                                <error_code>-161</error_code>
                                                                                                                </file_xfer_error>
                                                                                                                <file_xfer_error>
                                                                                                                <file_name>hadam3p_eu_69wa_2000_1_008138105_0_12.zip</file_name>
                                                                                                                <error_code>-161</error_code>
                                                                                                                </file_xfer_error>
                                                                                                                <file_xfer_error>
                                                                                                                <file_name>hadam3p_eu_69wa_2000_1_008138105_0_13.zip</file_name>
                                                                                                                <error_code>-161</error_code>
                                                                                                                </file_xfer_error>

                                                                                                                </message>
                                                                                                                ]]>


                                                                                                                And also the following messages in the client:



                                                                                                                8/12/2012 9:26:31 PM climateprediction.net Computation for task hadam3p_eu_69wa_2000_1_008138105_0 finished
                                                                                                                8/12/2012 9:26:31 PM climateprediction.net Output file hadam3p_eu_69wa_2000_1_008138105_0_1.zip for task hadam3p_eu_69wa_2000_1_008138105_0 absent
                                                                                                                8/12/2012 9:26:31 PM climateprediction.net Output file hadam3p_eu_69wa_2000_1_008138105_0_2.zip for task hadam3p_eu_69wa_2000_1_008138105_0 absent
                                                                                                                8/12/2012 9:26:31 PM climateprediction.net Output file hadam3p_eu_69wa_2000_1_008138105_0_3.zip for task hadam3p_eu_69wa_2000_1_008138105_0 absent
                                                                                                                8/12/2012 9:26:31 PM climateprediction.net Output file hadam3p_eu_69wa_2000_1_008138105_0_4.zip for task hadam3p_eu_69wa_2000_1_008138105_0 absent
                                                                                                                8/12/2012 9:26:31 PM climateprediction.net Output file hadam3p_eu_69wa_2000_1_008138105_0_5.zip for task hadam3p_eu_69wa_2000_1_008138105_0 absent
                                                                                                                8/12/2012 9:26:31 PM climateprediction.net Output file hadam3p_eu_69wa_2000_1_008138105_0_6.zip for task hadam3p_eu_69wa_2000_1_008138105_0 absent
                                                                                                                8/12/2012 9:26:31 PM climateprediction.net Output file hadam3p_eu_69wa_2000_1_008138105_0_7.zip for task hadam3p_eu_69wa_2000_1_008138105_0 absent
                                                                                                                8/12/2012 9:26:31 PM climateprediction.net Output file hadam3p_eu_69wa_2000_1_008138105_0_8.zip for task hadam3p_eu_69wa_2000_1_008138105_0 absent
                                                                                                                8/12/2012 9:26:31 PM climateprediction.net Output file hadam3p_eu_69wa_2000_1_008138105_0_9.zip for task hadam3p_eu_69wa_2000_1_008138105_0 absent
                                                                                                                8/12/2012 9:26:31 PM climateprediction.net Output file hadam3p_eu_69wa_2000_1_008138105_0_10.zip for task hadam3p_eu_69wa_2000_1_008138105_0 absent
                                                                                                                8/12/2012 9:26:31 PM climateprediction.net Output file hadam3p_eu_69wa_2000_1_008138105_0_11.zip for task hadam3p_eu_69wa_2000_1_008138105_0 absent
                                                                                                                8/12/2012 9:26:31 PM climateprediction.net Output file hadam3p_eu_69wa_2000_1_008138105_0_12.zip for task hadam3p_eu_69wa_2000_1_008138105_0 absent
                                                                                                                8/12/2012 9:26:31 PM climateprediction.net Output file hadam3p_eu_69wa_2000_1_008138105_0_13.zip for task hadam3p_eu_69wa_2000_1_008138105_0 absent


                                                                                                                Is it another kind of problem with the WUs?

                                                                                                                Les Bayliss
                                                                                                                Forum moderator
                                                                                                                Send message
                                                                                                                Joined: Sep 5 04
                                                                                                                Posts: 5348
                                                                                                                Credit: 8,876,229
                                                                                                                RAC: 549
                                                                                                                Message 44691 - Posted 13 Aug 2012 5:04:27 UTC - in response to Message 44690.

                                                                                                                  The files are missing because the model crashed soon after starting. So none of the output data files got created. It's BOINC complaining about not being able to find then.
                                                                                                                  Only the first couple of lines of the STDERR file are relevant.


                                                                                                                  ____________
                                                                                                                  Backups: Here

                                                                                                                  MarkJ
                                                                                                                  Avatar
                                                                                                                  Send message
                                                                                                                  Joined: Mar 28 09
                                                                                                                  Posts: 102
                                                                                                                  Credit: 5,075,426
                                                                                                                  RAC: 6
                                                                                                                  Message 44747 - Posted 20 Aug 2012 11:44:54 UTC

                                                                                                                    Last modified: 20 Aug 2012 11:45:58 UTC

                                                                                                                    Another one that crashed...


                                                                                                                    Model crashed: INITTIME: Atmosphere basis time mismatch tmp/xaakm.pipe_dummy 2048
                                                                                                                    Leaving CPDN_Main::Monitor...
                                                                                                                    Regional yearly means requires 12 input files got 0


                                                                                                                    Wu name: hadam3p_pnw_2yuc_1975_1_008145549_1
                                                                                                                    Created: 15 Aug 2012

                                                                                                                    I would link to it but your Akismet anti-spam system thinks your own URL's are spam. The wuid is 8300673
                                                                                                                    ____________
                                                                                                                    BOINC blog

                                                                                                                    Professor Desty Nova
                                                                                                                    Avatar
                                                                                                                    Send message
                                                                                                                    Joined: Sep 19 04
                                                                                                                    Posts: 65
                                                                                                                    Credit: 662,939
                                                                                                                    RAC: 275
                                                                                                                    Message 45193 - Posted 28 Oct 2012 23:03:51 UTC

                                                                                                                      More of these REPLANCA errors in this UK Met Office Coupled Model Full Resolution Ocean WU created Friday http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=8395212

                                                                                                                      <core_client_version>7.0.28</core_client_version>
                                                                                                                      <![CDATA[
                                                                                                                      <message>
                                                                                                                      The device does not recognize the command. (0x16) - exit code 22 (0x16)
                                                                                                                      </message>
                                                                                                                      <stderr_txt>
                                                                                                                      Suspended CPDN Monitor - Suspend request from BOINC...
                                                                                                                      Suspended CPDN Monitor - Suspend request from BOINC...

                                                                                                                      Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/pipe_dummy 2048

                                                                                                                      Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/pipe_dummy 2048

                                                                                                                      Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/pipe_dummy 2048

                                                                                                                      Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/pipe_dummy 2048

                                                                                                                      Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/pipe_dummy 2048

                                                                                                                      Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/pipe_dummy 2048
                                                                                                                      Sorry, too many model crashes! :-(
                                                                                                                      Called boinc_finish

                                                                                                                      </stderr_txt>
                                                                                                                      ]]>


                                                                                                                      ____________


                                                                                                                      Professor Desty Nova
                                                                                                                      Researching Karma the Hard Way

                                                                                                                      Profile mo.v
                                                                                                                      Forum moderator
                                                                                                                      Avatar
                                                                                                                      Send message
                                                                                                                      Joined: Sep 29 04
                                                                                                                      Posts: 2359
                                                                                                                      Credit: 7,024,721
                                                                                                                      RAC: 2,973
                                                                                                                      Message 45201 - Posted 31 Oct 2012 2:52:27 UTC

                                                                                                                        Thanks, Professor. The REPLANCA errors have been reported to Andy and Jonathan. If one task in a WU crashes with REPLANCA, all the tasks in that WU will, and on all OSs.
                                                                                                                        ____________
                                                                                                                        Cpdn news

                                                                                                                        Post to thread

                                                                                                                        Message boards : Number crunching : Output file absent & Too many errors (may have bug)




                                                                                                                        Copyright © 2002-2014 climateprediction.net