climateprediction.net home page

Several jobs uploads in project backoff


Advanced search

Message boards : Number crunching : Several jobs uploads in project backoff

AuthorMessage
Flashawk
Send message
Joined: Jan 30 12
Posts: 38
Credit: 10,197,388
RAC: 0
Message 46055 - Posted 26 Apr 2013 23:00:16 UTC

    I'm sure this might have been discussed before but I have 4 different WU's uploads go to 100% and either start over or go into "Project backoff". This is what the log said.......

    4/26/2013 8:41:12 AM | climateprediction.net | [error] Error reported by file upload server: can't open file /storage/incoming/uploader/hadam3p_eu_qfqb_2009_1_008346176_1_2.zip: No such file or directory

    4/26/2013 8:41:12 AM | climateprediction.net | Temporarily failed upload of hadam3p_eu_qfqb_2009_1_008346176_1_2.zip: transient upload error

    4/26/2013 8:41:12 AM | climateprediction.net | Backing off 3 min 54 sec on upload of hadam3p_eu_qfqb_2009_1_008346176_1_2.zip

    4/26/2013 8:17:33 AM | climateprediction.net | [error] Error reported by file upload server: can't open file /storage/incoming/uploader/hadam3p_eu_qf8n_2010_1_008345540_1_12.zip: No such file or directory

    4/26/2013 8:17:33 AM | climateprediction.net | Temporarily failed upload of hadam3p_eu_qf8n_2010_1_008345540_1_12.zip: transient upload error

    4/26/2013 8:17:33 AM | climateprediction.net | Backing off 24 min 56 sec on upload of hadam3p_eu_qf8n_2010_1_008345540_1_12.zip

    All this happened right around the same time that's why I'm hoping it's a server issue, but no one else has complained about it yet. If it's the work units, do I delete everything?

    TYA

    Profile MikeMarsUK
    Forum moderator
    Avatar
    Send message
    Joined: Jan 13 06
    Posts: 1498
    Credit: 7,610,036
    RAC: 3,558
    Message 46056 - Posted 27 Apr 2013 0:02:42 UTC - in response to Message 46055.

      ... All this happened right around the same time that's why I'm hoping it's a server issue, but no one else has complained about it yet. ...


      ...Error reported by file upload server...


      Yes, a server issue. This sort of thing typically happens at the weekend. The client will keep retrying ('project backoff') for 2 weeks, which is usually enough LOL. And if 2 weeks is not enough time for the staff at Oxford to fix it, you can give it more time by editing the task config files.


      ____________
      I'm a volunteer and my views are my own.
      News and Announcements and FAQ

      Flashawk
      Send message
      Joined: Jan 30 12
      Posts: 38
      Credit: 10,197,388
      RAC: 0
      Message 46057 - Posted 27 Apr 2013 1:04:13 UTC - in response to Message 46056.

        Yes, a server issue. This sort of thing typically happens at the weekend.


        Boy, isn't that the truth. Well, I'm relieved that it is a server issue rather than a model that I would have to abort for the 100th time, I take it that others are experiencing this problem? This seems to be only happening with the shorter regional models.

        Les Bayliss
        Forum moderator
        Send message
        Joined: Sep 5 04
        Posts: 5348
        Credit: 8,876,229
        RAC: 549
        Message 46058 - Posted 27 Apr 2013 1:40:25 UTC

          There was a problem a day ago in one of the server rooms. It only affected the final 13th zip of the regional models. It was reported as being fixed.

          As Mike said, it's the weekend, so, here we go again. :(


          Flashawk
          Send message
          Joined: Jan 30 12
          Posts: 38
          Credit: 10,197,388
          RAC: 0
          Message 46059 - Posted 27 Apr 2013 2:28:06 UTC - in response to Message 46058.

            I can't believe it's just me having problems, I noticed about 45 minutes ago a very small upload made it through (7.54MB regional upload) so I tried to see if it would take a 31MB upload and it didn't work. Anyway, thanks guys.

            Flashawk
            Send message
            Joined: Jan 30 12
            Posts: 38
            Credit: 10,197,388
            RAC: 0
            Message 46076 - Posted 27 Apr 2013 18:40:18 UTC

              Les, you said there was a problem like this not long ago, where is that thread? I have looked and can't find it. I have close to 500MB of uploads waiting and they keep trying to upload over and over sucking up bandwidth from the other project. I can't get more work for GPU-Grid until I upload results and my connections being choked by CPDN results.

              Les Bayliss
              Forum moderator
              Send message
              Joined: Sep 5 04
              Posts: 5348
              Credit: 8,876,229
              RAC: 549
              Message 46077 - Posted 27 Apr 2013 19:02:08 UTC - in response to Message 46076.

                It was mentioned in two emails from Andy to the moderators.
                The first one said that there was a problem and the IT people who look after that equipment room were looking into it.
                The 2nd one a few hours later said that the problem had been fixed.

                Whatever is wrong at the moment will NOT get looked at until business hours on Monday.
                The University of Oxford IS the City of Oxford. And vice versa. There are departments all over, most with their own IT section and equipment rooms, and this project has servers in several of them, wherever they could get space.

                The only cure for your problem is to turn off Network access and wait it out.
                Setting the project to No new work, and then Suspending climate models before they finish will minimise the transfer backlog, but it looks like that's too late for you.

                Flashawk
                Send message
                Joined: Jan 30 12
                Posts: 38
                Credit: 10,197,388
                RAC: 0
                Message 46078 - Posted 27 Apr 2013 19:21:17 UTC - in response to Message 46077.

                  Okay, thanks, sorry to bother you. I'll just keep on keeping on.

                  Profile mo.v
                  Forum moderator
                  Avatar
                  Send message
                  Joined: Sep 29 04
                  Posts: 2359
                  Credit: 7,024,721
                  RAC: 2,973
                  Message 46099 - Posted 28 Apr 2013 23:12:33 UTC

                    The time limit for uploading files from any project was extended. I can't remember whether the limit is now two or three months, but in any case it's far longer than we need.

                    But, but, but... each file is still only allowed 100 upload attempts, after which it expires. That's the BOINC rule. 100 is plenty but please don't use up the files' lives by repeatedly pressing the Retry now button in the Transfers tab. The files come to no harm while they wait.
                    ____________
                    Cpdn news

                    Art Masson
                    Avatar
                    Send message
                    Joined: Oct 16 11
                    Posts: 46
                    Credit: 2,468,272
                    RAC: 1,292
                    Message 46102 - Posted 29 Apr 2013 1:11:18 UTC - in response to Message 46099.

                      Thanks. Yes, my job in back=off is the 13th zip result file for a Pacific North West Regional Model.

                      Trotador
                      Send message
                      Joined: Aug 21 11
                      Posts: 6
                      Credit: 3,654,702
                      RAC: 156
                      Message 46115 - Posted 29 Apr 2013 18:26:18 UTC

                        Yeah, here too with two wus in back-off mode...

                        Flashawk
                        Send message
                        Joined: Jan 30 12
                        Posts: 38
                        Credit: 10,197,388
                        RAC: 0
                        Message 46116 - Posted 29 Apr 2013 19:29:08 UTC

                          Last modified: 29 Apr 2013 19:31:31 UTC

                          There's nothing I can do about that, every time I re-enable my internet connection to upload GPUGrid wu's, they try to upload too and slow my connection. I wish someone had the foresight to give us an option to stop certain results from uploading while allowing others to go through.

                          Les Bayliss
                          Forum moderator
                          Send message
                          Joined: Sep 5 04
                          Posts: 5348
                          Credit: 8,876,229
                          RAC: 549
                          Message 46117 - Posted 29 Apr 2013 20:43:39 UTC - in response to Message 46116.

                            I wish someone had the foresight to give us an option to stop certain results from uploading while allowing others to go through.

                            That option was asked for at BOINC/dev and refused.


                            ____________
                            Backups: Here

                            Flashawk
                            Send message
                            Joined: Jan 30 12
                            Posts: 38
                            Credit: 10,197,388
                            RAC: 0
                            Message 46118 - Posted 29 Apr 2013 21:31:57 UTC - in response to Message 46117.

                              Last modified: 29 Apr 2013 21:32:33 UTC

                              That option was asked for at BOINC/dev and refused.


                              I wonder why that is? They must not trust us enough to use it correctly, that really, really bothers me. I have 4 purpose built machines by me just for BOINC, I have about $15,000 tied up in these computers plus a $350.00 a month electric bill and they won't let us have a feature like that to witch I'm sure 90% of the other crunchers would want. It just don't make sense, I'm sure the benefits would far out weigh the their reasons for not wanting it.

                              candido
                              Send message
                              Joined: Nov 15 10
                              Posts: 22
                              Credit: 503,840
                              RAC: 0
                              Message 46121 - Posted 29 Apr 2013 22:15:35 UTC

                                I have the same problem with one wu trying to upload since Friday night

                                ____________

                                Profile [B@H] Ray
                                Avatar
                                Send message
                                Joined: Aug 19 05
                                Posts: 103
                                Credit: 1,742,148
                                RAC: 169
                                Message 46123 - Posted 29 Apr 2013 23:15:56 UTC

                                  Flashhawk
                                  Many of us would like that but they will not build it in, could be that if someone wrote it for then it would go to production.

                                  If you know how to write that and have a compiler you can download the code to put it in.

                                  Profile mo.v
                                  Forum moderator
                                  Avatar
                                  Send message
                                  Joined: Sep 29 04
                                  Posts: 2359
                                  Credit: 7,024,721
                                  RAC: 2,973
                                  Message 46124 - Posted 30 Apr 2013 0:23:46 UTC

                                    Thyme Lawn who is one of the CPDN moderators provided a patch that could have been incorporated to do the job of project-specific network suspend. He added it to a ticket that had been initiated by MikeMarsUK who posted in this thread a few days ago. He added a couple of extra patches which may have been for the Linux and Mac versions of BOINC.

                                    Dr David Anderson, who is our BOINC boss, refused the request on the grounds that the transfer backoff system renders it unnecessary. I know he's also keen to keep the buttons in BOINC Manager as few and as simple as possible.

                                    I've had some tickets accepted and some refused. For example, I've always thought it's confusing to have two folders with different contents both called BOINC. I asked for the BOINC Data folder to be renamed BOINC Data. My request was refused on the grounds that giving the same name to both was standard industry practice. Hmmm...

                                    BOINC is open-source but we still have our boss in Berkeley.
                                    ____________
                                    Cpdn news

                                    Art Masson
                                    Avatar
                                    Send message
                                    Joined: Oct 16 11
                                    Posts: 46
                                    Credit: 2,468,272
                                    RAC: 1,292
                                    Message 46127 - Posted 30 Apr 2013 3:23:46 UTC

                                      Thanks Mo. Agree with your comments on BOINC, but is there a known problem with the CPDN uploads that needs to be fixed?

                                      Les Bayliss
                                      Forum moderator
                                      Send message
                                      Joined: Sep 5 04
                                      Posts: 5348
                                      Credit: 8,876,229
                                      RAC: 549
                                      Message 46128 - Posted 30 Apr 2013 3:37:53 UTC - in response to Message 46127.

                                        is there a known problem with the CPDN uploads that needs to be fixed?

                                        That is the suspicion. It's under discussion.



                                        ____________
                                        Backups: Here

                                        Les Bayliss
                                        Forum moderator
                                        Send message
                                        Joined: Sep 5 04
                                        Posts: 5348
                                        Credit: 8,876,229
                                        RAC: 549
                                        Message 46133 - Posted 30 Apr 2013 20:20:03 UTC

                                          Last modified: 1 May 2013 3:37:07 UTC

                                          Nearly 12 hours ago, Jonathon said that the upload server was accepting uploads normally.
                                          My solitary PNW has just finished uploading, which confirms it.

                                          So the servers are OK.
                                          ____________
                                          Backups: Here

                                          Art Masson
                                          Avatar
                                          Send message
                                          Joined: Oct 16 11
                                          Posts: 46
                                          Credit: 2,468,272
                                          RAC: 1,292
                                          Message 46138 - Posted 1 May 2013 1:33:41 UTC - in response to Message 46133.

                                            My jobs uploaded sometime in the last 12 hours also....something got "fixed" ???

                                            Les Bayliss
                                            Forum moderator
                                            Send message
                                            Joined: Sep 5 04
                                            Posts: 5348
                                            Credit: 8,876,229
                                            RAC: 549
                                            Message 46140 - Posted 1 May 2013 3:42:38 UTC - in response to Message 46138.

                                              Apache was restarted.


                                              ____________
                                              Backups: Here

                                              Profile Dave Jackson
                                              Send message
                                              Joined: May 15 09
                                              Posts: 811
                                              Credit: 632,379
                                              RAC: 338
                                              Message 46141 - Posted 1 May 2013 7:22:05 UTC

                                                Just to day that I still have the problem on 2 EU uploads from computer ID: 1253464 Uploads get to 100 % before I get the error message about unable to open file in event log. I have again suspended network activity for this machine.

                                                Professor Desty Nova
                                                Avatar
                                                Send message
                                                Joined: Sep 19 04
                                                Posts: 65
                                                Credit: 662,939
                                                RAC: 275
                                                Message 46142 - Posted 1 May 2013 7:33:14 UTC

                                                  I seems the upload server of this WU http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=8480442 is not working well. It goes to 100% them it gives the following error:

                                                  01/05/2013 08:09:15 | climateprediction.net | Started upload of hadam3p_eu_q4z2_2004_1_008329581_1_1.zip
                                                  01/05/2013 08:14:53 | climateprediction.net | [error] Error reported by file upload server: can't open file /storage/incoming/uploader/hadam3p_eu_q4z2_2004_1_008329581_1_1.zip: No such file or directory
                                                  01/05/2013 08:14:53 | climateprediction.net | Temporarily failed upload of hadam3p_eu_q4z2_2004_1_008329581_1_1.zip: transient upload error
                                                  01/05/2013 08:14:53 | climateprediction.net | Backing off 4 hr 9 min 39 sec on upload of hadam3p_eu_q4z2_2004_1_008329581_1_1.zip

                                                  I've suspended the WU until this uploads well.

                                                  PS: From the BOINC files the upload server is http://cpdn-upload2.oerc.ox.ac.uk

                                                  ____________


                                                  Professor Desty Nova
                                                  Researching Karma the Hard Way

                                                  Profile Dave Jackson
                                                  Send message
                                                  Joined: May 15 09
                                                  Posts: 811
                                                  Credit: 632,379
                                                  RAC: 338
                                                  Message 46143 - Posted 1 May 2013 10:36:38 UTC

                                                    No need to suspend the work unit, just network activity to save bandwidth and the remote possibility of too many upload attempts. - It will I think still try and upload periodically even if the work unit is suspended. If you have other projects that need to upload, you can just limit when network activity is available which will stop it making too many attempts, then the others can upload during the window.

                                                    Profile Dave Jackson
                                                    Send message
                                                    Joined: May 15 09
                                                    Posts: 811
                                                    Credit: 632,379
                                                    RAC: 338
                                                    Message 46146 - Posted 2 May 2013 16:26:58 UTC - in response to Message 46143.

                                                      Didn't go this morning but both plus one other have gone this afternoon so all back to normal. Thanks mods and techies who sorted it all out.

                                                      Profile [B@H] Ray
                                                      Avatar
                                                      Send message
                                                      Joined: Aug 19 05
                                                      Posts: 103
                                                      Credit: 1,742,148
                                                      RAC: 169
                                                      Message 46147 - Posted 2 May 2013 17:37:27 UTC - in response to Message 46146.

                                                        Was traveling so I did not check for 3 days, but I see that my units uploaded all trickels this morning. I must have been the reboot of Apache that did it.
                                                        ____________
                                                        Keep on crunching Pizza@Home

                                                        Professor Desty Nova
                                                        Avatar
                                                        Send message
                                                        Joined: Sep 19 04
                                                        Posts: 65
                                                        Credit: 662,939
                                                        RAC: 275
                                                        Message 46148 - Posted 2 May 2013 18:57:34 UTC

                                                          The zip file of my WU uploaded fine just now.
                                                          ____________


                                                          Professor Desty Nova
                                                          Researching Karma the Hard Way

                                                          Ingleside
                                                          Send message
                                                          Joined: Aug 5 04
                                                          Posts: 92
                                                          Credit: 7,982,232
                                                          RAC: 8,702
                                                          Message 46151 - Posted 3 May 2013 15:56:32 UTC - in response to Message 46099.

                                                            Last modified: 3 May 2013 15:57:01 UTC

                                                            The time limit for uploading files from any project was extended. I can't remember whether the limit is now two or three months, but in any case it's far longer than we need.

                                                            It's 90 days.

                                                            But, but, but... each file is still only allowed 100 upload attempts, after which it expires. That's the BOINC rule. 100 is plenty but please don't use up the files' lives by repeatedly pressing the Retry now button in the Transfers tab. The files come to no harm while they wait.

                                                            I've never seen anything to a "100 upload attempts"-rule, and seeing how a file can easily reach this limit in 4 days (assuming re-tries once per hour), it wouldn't make any sence to increase the limit from 14-day to 90 days in this case.

                                                            To do a little test, blocked internet-connection and hit "retry" on a SIMAP-upload 110 times... no problem. Did a little editing, and, as BoincTask happily shows, it's now retried... 1234567 times, hits retry, 1234568 times, 1234569 times, 1234570 times, 1234571 times...

                                                            Since 1234567 >> 100 I didn't see anything to any 100-retry-limit on uploads...

                                                            Profile mo.v
                                                            Forum moderator
                                                            Avatar
                                                            Send message
                                                            Joined: Sep 29 04
                                                            Posts: 2359
                                                            Credit: 7,024,721
                                                            RAC: 2,973
                                                            Message 46152 - Posted 3 May 2013 17:24:52 UTC

                                                              I'm sure you're right. Apparently Nicolás also thought the limit was 100 and looked in the BOINC code but couldn't find any limiting number of attempts. There are certain things that BOINC only allows 100 times and some of us must have assumed that the same limit applied to uploads without ever seeing the code.

                                                              You have exploded an urban myth!


                                                              ____________
                                                              Cpdn news

                                                              Ingleside
                                                              Send message
                                                              Joined: Aug 5 04
                                                              Posts: 92
                                                              Credit: 7,982,232
                                                              RAC: 8,702
                                                              Message 46153 - Posted 3 May 2013 22:17:46 UTC - in response to Message 46152.

                                                                You have exploded an urban myth!

                                                                Well, some myths are easy to bust...


                                                                Post to thread

                                                                Message boards : Number crunching : Several jobs uploads in project backoff




                                                                Copyright © 2002-2014 climateprediction.net