climateprediction.net home page

Uploads not working


Advanced search

Message boards : Number crunching : Uploads not working

AuthorMessage
Nuadormrac
Avatar
Send message
Joined: Oct 14 05
Posts: 40
Credit: 200,900
RAC: 0
Message 44773 - Posted 31 Aug 2012 4:42:49 UTC

    Getting the following:

    8/31/2012 12:31:24 AM | climateprediction.net | Started upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip
    8/31/2012 12:31:26 AM | climateprediction.net | Temporarily failed upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip: transient HTTP error
    8/31/2012 12:31:26 AM | climateprediction.net | Backing off 3 min 9 sec on upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip
    8/31/2012 12:33:33 AM | climateprediction.net | Started upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip
    8/31/2012 12:33:35 AM | climateprediction.net | Temporarily failed upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip: transient HTTP error
    8/31/2012 12:33:35 AM | climateprediction.net | Backing off 6 min 7 sec on upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip
    8/31/2012 12:33:40 AM | climateprediction.net | Started upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip
    8/31/2012 12:33:41 AM | climateprediction.net | Temporarily failed upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip: transient HTTP error
    8/31/2012 12:33:41 AM | climateprediction.net | Backing off 8 min 25 sec on upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip


    Checking the server status page, one of the upload servers shows as not running, the other 2 upload servers are up. Things are trickeling, but data can't upload....
    ____________

    Eirik Redd
    Send message
    Joined: Aug 31 04
    Posts: 242
    Credit: 26,472,483
    RAC: 20,534
    Message 44778 - Posted 31 Aug 2012 9:25:26 UTC - in response to Message 44773.

      Getting similar errors but not many waiting uploads so far.
      Server status page shows "uploader1.atm" as down.
      Staff probably aware already - it's already after 9AM in the prime time zone.
      ____________

      Richard Haselgrove
      Send message
      Joined: Jan 1 07
      Posts: 275
      Credit: 6,861,830
      RAC: 193
      Message 44783 - Posted 31 Aug 2012 11:38:44 UTC

        Information from staff:

        The hard disk running the operating system on uploader1.atm has failed and needs to be replaced. We have ordered a new disk which will arrive on Monday and be installed on that day. So at the moment this machine is shut down and won't be up-and-running until Monday, I am afraid.

        That will affect, at least, the intermediate (_1 to _12) file uploads for EU regional models, possibly others too.

        Eirik Redd
        Send message
        Joined: Aug 31 04
        Posts: 242
        Credit: 26,472,483
        RAC: 20,534
        Message 44791 - Posted 1 Sep 2012 13:18:03 UTC - in response to Message 44783.

          "The hard disk running the Operating System" WTF?

          This is one of the looniest postings I've ever seen here.

          Any serious server installation has at least a mirror of the OS for backup or alternative boot and OS on whatever of several physical drives -- whether IBM mainframe or my local mini-cluster or the cloud we are all expected to trust, or a lousy backup boot partition on Linux.
          "The hard disk running the OS" what could that hard disk possibly be?
          Are we trusting all this compute power to the power of the "C: drive"

          And how would replacing the bare disk fix the loss of the OS --

          Sorry for the rant, but the explanation makes no sense whatsoever at all - and makes the support team there look like total idiots - which I know they are not.

          Yes - the compete explanation would cover a lot of techie stuff that would bore most of us to tears -- but the nonsensical explanation posted is -so -dumb.

          Me -- sometimes the project has problems - as far as I can see the problems get fixed within a week -- no data ever lost. Last 6 years or so. I keep on contributing -- no regrets.

          But "need an OS disk to keep running" - Sorry about that but is so idiotic -- could have been a totally uninformed politician posting that.

          Please don't BS us who contribute.

          Maybe - "the team waits for hardware to fix the problem"

          might be plausible --

          "Need an OS disk" obviously makes fools of us all.

          In any case- keep on crunching - the crew have done wonders - and keep on doing so --
          \
          But - nonsensical pretend explanations of problems are losers in the long run.

          ____________

          Profile Dave Jackson
          Send message
          Joined: May 15 09
          Posts: 747
          Credit: 623,678
          RAC: 240
          Message 44793 - Posted 1 Sep 2012 16:03:31 UTC

            Assuming a raid system, if one of the hard disks had failed, it might well shut down as a precaution, if the second disk in a 2 disk raid system also went that might cause data loss so they would be awaiting a new disk to rebuild the array.

            I have never used raid, just been rather paranoid about backing up important stuff so this is purely based on my reading not experience lol.

            Dave

            Flashawk
            Send message
            Joined: Jan 30 12
            Posts: 38
            Credit: 10,197,388
            RAC: 1
            Message 44794 - Posted 2 Sep 2012 0:34:26 UTC

              Now the South African download server is down, why doesn't that surprise me? The techs at Oxford could care less about this project. The whole worlds watching them, I hope they never put it on their résumé.

              Eirik Redd
              Send message
              Joined: Aug 31 04
              Posts: 242
              Credit: 26,472,483
              RAC: 20,534
              Message 44795 - Posted 2 Sep 2012 5:04:25 UTC - in response to Message 44794.

                Actually, I believe that the techs on this project are doing a very good job.

                The limited funding for the research puts them in a position where they can't have what most of us "techies" just assume is normal. They have to do the best they can with what they've got, and that's not a lot.
                Mirrored drives for the OS - we see that's not true. Spare disc drives just laying around or online already waiting for a problem - obviously not so. Redundant SAN with no SPOF anywhere and automatic failover to a backup system - at least a year or two worth of storage waiting on-line already? Don't think so.
                Maintenance contract with (big database company that will fix any problems in 24 hours provided that you have enough spare backup hardware pre-certified?)

                Heh- all that could be fixed with less than 25 million euros - rough guess. Maybe 50. (not counting the service contracts with the vendors)

                The tech support at the project are supporting - not only the hardware - but more important and invisible to us volunteers - they are supporting the access to the work we have done - the database - for researchers worldwide.

                Understaffed, overworked, with more job demands than anything I ever did as a techie. (Hardware, software, database, application expertise - that would be at least 8 FTEs at even the cheapest shop I ever worked in)

                My earlier rant about the ongoing problems with servers should be interpreted as me venting my frustration with the whole situation -

                NOT as an accusation of the understaffed and underfunded crew.














                ____________

                Profile Dave Jackson
                Send message
                Joined: May 15 09
                Posts: 747
                Credit: 623,678
                RAC: 240
                Message 44796 - Posted 2 Sep 2012 6:28:38 UTC - in response to Message 44795.

                  Totally agree Erik! Two Techies there to do the job. If they had your estimate of eight and they were the same quality as those they have and those eight had the money to buy the hardware they wanted ........ I don't think we would see many of the problems we do....... Or maybe they would just try and do 4 times as much, succeed and still get as many complaints?

                  Dave

                  Eirik Redd
                  Send message
                  Joined: Aug 31 04
                  Posts: 242
                  Credit: 26,472,483
                  RAC: 20,534
                  Message 44807 - Posted 4 Sep 2012 12:04:46 UTC - in response to Message 44796.

                    Last modified: 4 Sep 2012 12:25:34 UTC

                    Uploads are working slowly - expect will catch up next 3-4 hours.
                    Thx Dave - yeah volunteer here a few years the temporary failures of hardware are annoying but no big deal - wait a few days or week at worst and all the work gets uploaded and distributed eventually. Nothing ever lost.
                    Once happened that a misconfig and load of crap wu's got my goat by wasting my limited bandwidth , that was a while ago.

                    Main point is - most contributors never notice a week's downtime on the upload server. Last time I looked the "top -- whatever" - computers - they were wasting wu's a mile a minute -

                    So - thanks - let's keep the osmolality of the effluent minimal when we post here, and keep on crunching -- it's worth doing. Apologize for any flaming I've done.

                    And - to all - complain, bitch and worry -- if there's ever a problem -- it might be an old moldy problem - but it might be a new problem - and reporting such a problem might very well save all of us volunteers a lot of wasted effort -

                    So - If you read this board - all complaints are welcome !! :):) - the Mods welcome the chance to help all problems !! :):

                    Actually, they do help a lot -- thanks

                    PS - I am not MOD, never will be, but thanks to them all
                    ____________

                    Bob
                    Send message
                    Joined: Dec 20 04
                    Posts: 6
                    Credit: 3,720,874
                    RAC: 788
                    Message 44811 - Posted 7 Sep 2012 5:44:03 UTC

                      7 Sept 2012, 05:36 UTC;

                      upload disk full error message started to appear 6 Sept 2012 at 22:23 UTC

                      Server status page indicates server is up and running

                      Just thought you would like to know.
                      ____________

                      Eirik Redd
                      Send message
                      Joined: Aug 31 04
                      Posts: 242
                      Credit: 26,472,483
                      RAC: 20,534
                      Message 44812 - Posted 7 Sep 2012 8:04:27 UTC - in response to Message 44811.

                        Thanks. Confirming what you reported. Same here.
                        ____________

                        Profile Dave Jackson
                        Send message
                        Joined: May 15 09
                        Posts: 747
                        Credit: 623,678
                        RAC: 240
                        Message 44813 - Posted 7 Sep 2012 8:09:49 UTC - in response to Message 44812.

                          I am getting the same on an eu model. saf model which goes to a different server is fine. They should be starting work about now in Oxford so I assume we will see some action this morning.

                          Dave

                          ggrinton
                          Send message
                          Joined: Jan 24 06
                          Posts: 5
                          Credit: 371,295
                          RAC: 400
                          Message 44814 - Posted 7 Sep 2012 10:10:07 UTC - in response to Message 44813.

                            Confirming that I am also getting upload failures repeatedly. In itself that does not worry me, but it does chew through my upload quota at a great rate. Is tehre any way to disable the upload for a while? (This is a completed task, and I have other tasks running, so I do not want to just disable network traffic.)
                            ____________

                            Eirik Redd
                            Send message
                            Joined: Aug 31 04
                            Posts: 242
                            Credit: 26,472,483
                            RAC: 20,534
                            Message 44818 - Posted 7 Sep 2012 10:31:58 UTC - in response to Message 44814.

                              Last modified: 7 Sep 2012 10:51:58 UTC

                              Confirming that I am also getting upload failures repeatedly. In itself that does not worry me, but it does chew through my upload quota at a great rate. Is tehre any way to disable the upload for a while? (This is a completed task, and I have other tasks running, so I do not want to just disable network traffic.)


                              You could "disable network activity" on one of the tabs in the manager --
                              BUT -- seems that uploads are working again, so try that option later.

                              OH gorgonzola and other cheeses -- so overwhelmed with backlog uploads now -- just wait a few hours.
                              ____________

                              Profile Dave Jackson
                              Send message
                              Joined: May 15 09
                              Posts: 747
                              Credit: 623,678
                              RAC: 240
                              Message 44819 - Posted 7 Sep 2012 13:28:31 UTC - in response to Message 44818.

                                Just to confirm that an eu zip file went through at 10:54 on one machine and two more have gone through since so issue seems resolved apart from my curiosity - in the past when the disk has filled up it has taken several hours to transfer the data before the disk has come back on line again. Seems suspiciously quick for it to have really filled up.

                                Dave

                                transient
                                Send message
                                Joined: Oct 3 06
                                Posts: 42
                                Credit: 2,297,149
                                RAC: 2,321
                                Message 44820 - Posted 7 Sep 2012 15:50:13 UTC

                                  could redirecting the url for the uploadhandler in the hosts file to say 127.0.0.0 be an option?

                                  Eirik Redd
                                  Send message
                                  Joined: Aug 31 04
                                  Posts: 242
                                  Credit: 26,472,483
                                  RAC: 20,534
                                  Message 44832 - Posted 14 Sep 2012 14:39:10 UTC

                                    Problems with uploader1 both up and down . Friday of course.
                                    ____________

                                    Profile geophi
                                    Forum moderator
                                    Send message
                                    Joined: Aug 7 04
                                    Posts: 1475
                                    Credit: 22,537,215
                                    RAC: 1,211
                                    Message 44834 - Posted 14 Sep 2012 21:24:08 UTC - in response to Message 44832.

                                      Problems with uploader1 both up and down . Friday of course.


                                      I let the project people know, but like you say it's Friday. Hopefully it'll get fixed early next week.

                                      Profile Dave Jackson
                                      Send message
                                      Joined: May 15 09
                                      Posts: 747
                                      Credit: 623,678
                                      RAC: 240
                                      Message 44837 - Posted 17 Sep 2012 13:47:08 UTC - in response to Message 44834.

                                        My three waiting uploads have all gone, however the server keeps going back to red every so often on the server status page.

                                        Dave.

                                        Eirik Redd
                                        Send message
                                        Joined: Aug 31 04
                                        Posts: 242
                                        Credit: 26,472,483
                                        RAC: 20,534
                                        Message 44838 - Posted 17 Sep 2012 14:51:16 UTC - in response to Message 44837.

                                          Yup - the server goes on and off. Has uploaded a few dozen files from here.
                                          All what I worry about is if the uploads get lost - however many days it takes to get the job done is not a problem. Losing data is the possible problem - but that has never happened as far as I know - long delays happen when server is catching up.
                                          I run 6 machines - right now 3 have network disabled - the other 3 are uploading slowly from time to time. Won't enable network for the other 3 until the online ones clear their queues. Might be a while.

                                          The important thing is not to lose the uploads. Patience is a virtue.


                                          ____________

                                          Les Bayliss
                                          Forum moderator
                                          Send message
                                          Joined: Sep 5 04
                                          Posts: 5290
                                          Credit: 8,867,520
                                          RAC: 1,315
                                          Message 44839 - Posted 17 Sep 2012 20:08:12 UTC

                                            Jonathan only cleared 750 Gigs of space over the weekend. He's currently looking for a cupboard with some spare shelf space to store some more. Data is stacked up everywhere. Probably have to buy some buckets for it. :)


                                            ____________
                                            Backups: Here

                                            Profile Dave Jackson
                                            Send message
                                            Joined: May 15 09
                                            Posts: 747
                                            Credit: 623,678
                                            RAC: 240
                                            Message 44840 - Posted 17 Sep 2012 21:01:05 UTC - in response to Message 44839.

                                              Buckets that size don't come cheap.

                                              Profile tullio
                                              Send message
                                              Joined: Aug 6 04
                                              Posts: 183
                                              Credit: 289,711
                                              RAC: 947
                                              Message 44841 - Posted 18 Sep 2012 15:32:08 UTC - in response to Message 44840.

                                                At SETI@home volunteers are donating dozens of 1TB and 2TB disks to store data.
                                                Tullio
                                                ____________

                                                Profile JIM
                                                Send message
                                                Joined: Dec 31 07
                                                Posts: 665
                                                Credit: 3,883,239
                                                RAC: 1,869
                                                Message 44842 - Posted 18 Sep 2012 17:08:35 UTC

                                                  At SETI@home volunteers are donating dozens of 1TB and 2TB disks to store data.


                                                  I don’t know if this would work with this project. How do you guard against data loss.

                                                  When you say that they “donate” I assume that the drives remain in the homes of the donor. Home, non-commercial quality HD’s are not know for there overwhelming reliability. I had a 2TB external backup drive fail only a few months ago. One moment it worked, a few hours later it didn't. No warning. Also what happens if a person who has project data just suddenly stops participating.

                                                  One thing that can be said for CP is that despite all our server problems we have NEVER LOST DATA!

                                                  ____________

                                                  transient
                                                  Send message
                                                  Joined: Oct 3 06
                                                  Posts: 42
                                                  Credit: 2,297,149
                                                  RAC: 2,321
                                                  Message 44843 - Posted 19 Sep 2012 4:02:18 UTC

                                                    No, by donating Tulio means buying them and sending them on to Berkely.

                                                    Les Bayliss
                                                    Forum moderator
                                                    Send message
                                                    Joined: Sep 5 04
                                                    Posts: 5290
                                                    Credit: 8,867,520
                                                    RAC: 1,315
                                                    Message 44846 - Posted 19 Sep 2012 5:55:41 UTC

                                                      All servers at Oxford, and there are many different departments, with server rooms and IT sections, would most likely be under a service contract.
                                                      And crunchers don't need to know all the 'behind the scenes' details and plans.


                                                      ____________
                                                      Backups: Here

                                                      Eirik Redd
                                                      Send message
                                                      Joined: Aug 31 04
                                                      Posts: 242
                                                      Credit: 26,472,483
                                                      RAC: 20,534
                                                      Message 44847 - Posted 19 Sep 2012 6:42:28 UTC - in response to Message 44846.

                                                        All servers at Oxford, and there are many different departments, with server rooms and IT sections, would most likely be under a service contract.
                                                        And crunchers don't need to know all the 'behind the scenes' details and plans.



                                                        Also, there are various upload (and download) servers worldwide both for current wu and for the database of completed results.

                                                        So it's not just "distributed computing" - it's "distributed database"

                                                        Like JIM posted - no uploaded results lost in 8 years.

                                                        It's possible to build fairly reliable systems from consumer-grade discs - but takes a lot of planning, design and maintenance. Donating cheap hardware to the project might help, probably not - don't know what SETI is doing. There's lots of information on the web on how to do it - but the devil is in the details. And the work-hours of maintaining such a thing is -- done that- don't want to again - retired.

                                                        Like Les said -- I don't want to know the details -- because I've been there - and second-guessing future storage improvements and estimated total costs and all is a total brain-bender and management always complains anyhow no matter how hard you work to design and build a thing that will be obsolete before the Board of Directors signs off.

                                                        Thanks to the crew for keeping things going mostly, and for not losing any uploaded data.






                                                        ____________

                                                        Profile Dave Jackson
                                                        Send message
                                                        Joined: May 15 09
                                                        Posts: 747
                                                        Credit: 623,678
                                                        RAC: 240
                                                        Message 44848 - Posted 19 Sep 2012 7:00:14 UTC - in response to Message 44847.

                                                          There is a donate button http://climateprediction.net/content/donations on the main project page that could probably do with more publicity. I suspect that more people working for the project might be a higher priority than extra hard drives but as Les says, us crunchers don't need to know all the details and if we did we would probably be overwhelmed!

                                                          Dave

                                                          Eirik Redd
                                                          Send message
                                                          Joined: Aug 31 04
                                                          Posts: 242
                                                          Credit: 26,472,483
                                                          RAC: 20,534
                                                          Message 44849 - Posted 19 Sep 2012 8:04:09 UTC - in response to Message 44848.

                                                            There is a donate button http://climateprediction.net/content/donations on the main project page that could probably do with more publicity. I suspect that more people working for the project might be a higher priority than extra hard drives but as Les says, us crunchers don't need to know all the details and if we did we would probably be overwhelmed!

                                                            Dave


                                                            Overwhelmed - no way

                                                            If the project could pay me a lousy USD 120000 per year and give me another few million for hardware I could fix all their problems (add a few consultants on the database side) I'd even come out of retirement!

                                                            Might try the "donate" button


                                                            ____________

                                                            Profile Dave Jackson
                                                            Send message
                                                            Joined: May 15 09
                                                            Posts: 747
                                                            Credit: 623,678
                                                            RAC: 240
                                                            Message 44851 - Posted 19 Sep 2012 9:23:19 UTC - in response to Message 44847.

                                                              Last modified: 19 Sep 2012 9:28:41 UTC

                                                              Totally understand!

                                                              Dave

                                                              Profile tullio
                                                              Send message
                                                              Joined: Aug 6 04
                                                              Posts: 183
                                                              Credit: 289,711
                                                              RAC: 947
                                                              Message 44852 - Posted 19 Sep 2012 10:31:24 UTC - in response to Message 44843.

                                                                No, by donating Tulio means buying them and sending them on to Berkely.

                                                                Correct. It is the GPU User Group, that is those using graphic cards to accelerate their processing, that sponsors donations, orders disks and also servers, and sends them to the Space Sciences Laboratory. I think it is the only BOINC project where this happens.
                                                                Tullio
                                                                ____________

                                                                Eirik Redd
                                                                Send message
                                                                Joined: Aug 31 04
                                                                Posts: 242
                                                                Credit: 26,472,483
                                                                RAC: 20,534
                                                                Message 44853 - Posted 19 Sep 2012 11:30:59 UTC - in response to Message 44852.

                                                                  No, by donating Tulio means buying them and sending them on to Berkely.

                                                                  Correct. It is the GPU User Group, that is those using graphic cards to accelerate their processing, that sponsors donations, orders disks and also servers, and sends them to the Space Sciences Laboratory. I think it is the only BOINC project where this happens.
                                                                  Tullio


                                                                  Got any more info - or link? sounds possibly useful.
                                                                  ____________

                                                                  MarkJ
                                                                  Avatar
                                                                  Send message
                                                                  Joined: Mar 28 09
                                                                  Posts: 102
                                                                  Credit: 5,075,426
                                                                  RAC: 84
                                                                  Message 44854 - Posted 19 Sep 2012 13:04:03 UTC - in response to Message 44853.

                                                                    No, by donating Tulio means buying them and sending them on to Berkely.

                                                                    Correct. It is the GPU User Group, that is those using graphic cards to accelerate their processing, that sponsors donations, orders disks and also servers, and sends them to the Space Sciences Laboratory. I think it is the only BOINC project where this happens.
                                                                    Tullio


                                                                    Got any more info - or link? sounds possibly useful.


                                                                    GPU User Group - www.gpuug.org

                                                                    Users can donate towards a specific purpose, or they can buy a drive, or even donate directly to the project. The last one is done via Paypal and at the end of the month the project gets a payment less Paypal fees. UC Berkeley aren't allowed a Paypal account themselves.

                                                                    I did ask Jonathan if he could tell me how much a 2Tb drive costs in the UK so I could work out a suitable donation but he hasn't provided me with any information (probably rather busy I expect). If he or someone else could tell us that and how many drives they need we could work towards that goal. The idea is to have smallish goals for specific items, something achievable.
                                                                    ____________
                                                                    BOINC blog

                                                                    Les Bayliss
                                                                    Forum moderator
                                                                    Send message
                                                                    Joined: Sep 5 04
                                                                    Posts: 5290
                                                                    Credit: 8,867,520
                                                                    RAC: 1,315
                                                                    Message 44857 - Posted 19 Sep 2012 20:05:13 UTC - in response to Message 44854.

                                                                      It's probably university policy to not discus money matters with people not working for the uni. A commercial-in-confidence type of thing.

                                                                      And a hard disk isn't of much use without a server to run it.
                                                                      And servers need rack space, and power.


                                                                      ____________
                                                                      Backups: Here

                                                                      Profile tullio
                                                                      Send message
                                                                      Joined: Aug 6 04
                                                                      Posts: 183
                                                                      Credit: 289,711
                                                                      RAC: 947
                                                                      Message 44860 - Posted 20 Sep 2012 2:03:18 UTC - in response to Message 44857.

                                                                        It's probably university policy to not discus money matters with people not working for the uni. A commercial-in-confidence type of thing.

                                                                        And a hard disk isn't of much use without a server to run it.
                                                                        And servers need rack space, and power.


                                                                        All very true. Servers paddym and georgem were also donated. Rack space and power were not. But the SETI@home devs/admins provided them. They are also volunteers for SETI@home besides doing work for UC Berkeley.
                                                                        Tullio

                                                                        ____________

                                                                        MarkJ
                                                                        Avatar
                                                                        Send message
                                                                        Joined: Mar 28 09
                                                                        Posts: 102
                                                                        Credit: 5,075,426
                                                                        RAC: 84
                                                                        Message 44862 - Posted 20 Sep 2012 11:16:42 UTC - in response to Message 44857.

                                                                          Last modified: 20 Sep 2012 11:24:09 UTC

                                                                          It's probably university policy to not discus money matters with people not working for the uni. A commercial-in-confidence type of thing.

                                                                          And a hard disk isn't of much use without a server to run it.
                                                                          And servers need rack space, and power.



                                                                          I was assuming they would be replacing existing drives with something that would be newer (and theoretically more reliable) as well as possibly giving them more space, depending on what size drives they are replacing. A rough idea of how much drives cost (recommended retail price) and how many they want/need to replace would have been helpful. If that is too much information then how can they expect us to help?
                                                                          ____________
                                                                          BOINC blog

                                                                          Eirik Redd
                                                                          Send message
                                                                          Joined: Aug 31 04
                                                                          Posts: 242
                                                                          Credit: 26,472,483
                                                                          RAC: 20,534
                                                                          Message 44864 - Posted 20 Sep 2012 14:02:57 UTC - in response to Message 44862.

                                                                            It's probably university policy to not discus money matters with people not working for the uni. A commercial-in-confidence type of thing.

                                                                            And a hard disk isn't of much use without a server to run it.
                                                                            And servers need rack space, and power.



                                                                            I was assuming they would be replacing existing drives with something that would be newer (and theoretically more reliable) as well as possibly giving them more space, depending on what size drives they are replacing. A rough idea of how much drives cost (recommended retail price) and how many they want/need to replace would have been helpful. If that is too much information then how can they expect us to help?


                                                                            Searching the web -- SAS drives in the 300 GB capacity range at 15k rpm are running a few hundred dollars each - a bit less if you buy case lots. 600GB drives in this speed and reliablity range are a bit more expensive per TB.

                                                                            Consumer grade 1-2 TB drives are cheaper by far per TB but need much more expertise and connectivity and replication to make them competitive for enterprise reliability needs. And less than half the read speed and even less seek speed. So you need a database analyst and some serious testing to compare the multi-redundant SAS to the even-more-redundant cheap disks you would need to mimic the speed and reliability of the "server-grade" disks.


                                                                            What I'm saying is -- speed, reliability, redundancy -- takes a lot of work to figure what's best for any particular application.

                                                                            Not to mention connectivity - you want dual-port SAS drives so when one network server fails the backup system works -- another couple hundred per drive - and consumer-grade 2 TB drives don't even offer this option.


                                                                            It's not about replacing a few drives with newer cheaper ones.
                                                                            It's about building a reliable replacement system or 2 or 3
                                                                            ____________

                                                                            Profile Greg van Paassen
                                                                            Send message
                                                                            Joined: Nov 17 07
                                                                            Posts: 142
                                                                            Credit: 4,270,126
                                                                            RAC: 979
                                                                            Message 44866 - Posted 21 Sep 2012 6:11:42 UTC - in response to Message 44864.

                                                                              What you say is true for data that must remain readily accessible, Eirik. But it seems to me (from the outside) that CPDN's main requirement is for somewhere to put data that no-one has wanted during the last few months, and that is unlikely to be wanted for the next few months or years -- but it might be wanted sometime. Most likely, when a scientist does want it, they'll be able to give plenty of notice.

                                                                              Back in the day, IBM used to sell the concept of tiered storage: on-line, near-line and off-line. The idea was that 'hot' data would stay on the on-line storage, and when people stopped accessing it it would migrate to progressively less responsive (but cheaper) storage.

                                                                              Of course IBM sold fancy systems to 'migrate' unneeded data automatically. But I don't think CPDN needs that. It does need some kind of systematic archiving process, though.

                                                                              I'd caution that archiving is an ongoing process, not a one-time event, and resources should be allocated and processes set up accordingly.

                                                                              For non-critical data such as CPDN run results, two copies on consumer-grade storage, kept in separate file store-rooms in separate buildings in separate campuses and tested annually, should provide enough of a guarantee of future accessibility.

                                                                              100 TB of non-critical offline storage is then some checksum files, a hard-back book, a label maker, 100+ 2TB disks and a USB3 dock, and two cupboards -- plus a high-school student volunteer for a few weeks each year (to stock-take and checksum the archives, replace any failed disks and archive new data). And the instructions for the student.

                                                                              Les Bayliss
                                                                              Forum moderator
                                                                              Send message
                                                                              Joined: Sep 5 04
                                                                              Posts: 5290
                                                                              Credit: 8,867,520
                                                                              RAC: 1,315
                                                                              Message 44867 - Posted 21 Sep 2012 7:53:03 UTC

                                                                                The data is permanently on line, and can be accessed via this page.
                                                                                Each model completed, is also linked to the results pages, via a line at the bottom of each model's page.

                                                                                It needs to also be remembered that there's no 'cpdn section' at Oxford uni.
                                                                                The research is a research project of the Atmospheric, Oceanic and Planetary Physics department.
                                                                                The 2 "programmers" are IT specialists / programmers who, with others, work for the Oxford e-Research Centre.

                                                                                The Oxford e-Research Centre works with research units across the whole of Oxford University to enable the use and development of innovative computational and information technology in multidisciplinary collaborations.

                                                                                As such, it can be assumed that they know about many things, including large data bases, and various storage schemes. In fact, I can vaguely recall reading the job specs for one of these positions a few years ago, which talked about these same things as part of the job requirements.


                                                                                ____________
                                                                                Backups: Here

                                                                                MarkJ
                                                                                Avatar
                                                                                Send message
                                                                                Joined: Mar 28 09
                                                                                Posts: 102
                                                                                Credit: 5,075,426
                                                                                RAC: 84
                                                                                Message 44882 - Posted 23 Sep 2012 11:59:49 UTC

                                                                                  I wonder if the BOINC volunteer storage, if they ever get it completed, would be useful here. I would post a link but the Akismet anti-spam is so paranoid I can't link to it. Suffice to say its at:

                                                                                  boinc dot berkeley dot edu slash trac slash wiki slash VolunteerStorage#
                                                                                  ____________
                                                                                  BOINC blog

                                                                                  Profile Dave Jackson
                                                                                  Send message
                                                                                  Joined: May 15 09
                                                                                  Posts: 747
                                                                                  Credit: 623,678
                                                                                  RAC: 240
                                                                                  Message 44886 - Posted 24 Sep 2012 12:08:03 UTC

                                                                                    I presume the server filled up again over the weekend?

                                                                                    Les Bayliss
                                                                                    Forum moderator
                                                                                    Send message
                                                                                    Joined: Sep 5 04
                                                                                    Posts: 5290
                                                                                    Credit: 8,867,520
                                                                                    RAC: 1,315
                                                                                    Message 44889 - Posted 24 Sep 2012 15:01:36 UTC - in response to Message 44886.

                                                                                      Yes. Message in the News thread a couple of days ago.

                                                                                      ____________
                                                                                      Backups: Here

                                                                                      Profile Dave Jackson
                                                                                      Send message
                                                                                      Joined: May 15 09
                                                                                      Posts: 747
                                                                                      Credit: 623,678
                                                                                      RAC: 240
                                                                                      Message 44890 - Posted 24 Sep 2012 15:07:15 UTC - in response to Message 44889.

                                                                                        Thanks, sorry, not paying attention! Normally I spot the news posts.

                                                                                        Dave

                                                                                        Les Bayliss
                                                                                        Forum moderator
                                                                                        Send message
                                                                                        Joined: Sep 5 04
                                                                                        Posts: 5290
                                                                                        Credit: 8,867,520
                                                                                        RAC: 1,315
                                                                                        Message 44899 - Posted 25 Sep 2012 22:02:41 UTC

                                                                                          Jonathan has been working on it.
                                                                                          But the biggest server for moving the data to has a disk problem now. And it takes a long time to 'chunk' the data for moving, verify that it's OK after the move, and then re-link each model to the research area.
                                                                                          There's terabytes to move, the university net isn't particularly fast, there was a network failure, and most of the IT people from all over Oxford took off for 'more interesting places' as soon as Long Vacation started.


                                                                                          ____________
                                                                                          Backups: Here

                                                                                          BarryAZ
                                                                                          Send message
                                                                                          Joined: Jul 13 05
                                                                                          Posts: 118
                                                                                          Credit: 11,297,248
                                                                                          RAC: 194
                                                                                          Message 44900 - Posted 25 Sep 2012 22:13:52 UTC - in response to Message 44899.

                                                                                            Les, I understand -- happens often enough over here - as soon as I spotted the upload problem, I suspended my Climate apps and let other applications cycle along. I long ago learned that one should have two or three applications running on a workstation for each the CPU apps and the GPU apps.

                                                                                            Profile Byron Leigh Hatch @ team Carl Sagan
                                                                                            Avatar
                                                                                            Send message
                                                                                            Joined: Aug 17 04
                                                                                            Posts: 169
                                                                                            Credit: 3,902,400
                                                                                            RAC: 3,174
                                                                                            Message 44902 - Posted 25 Sep 2012 22:40:42 UTC - in response to Message 44899.



                                                                                              thanks Les,

                                                                                              is there anything we Crunchers should do with our BOINC client ?

                                                                                              I wish Jonathan well.

                                                                                              for those who missed Jonathan post.

                                                                                              <quote>

                                                                                              We suffered a brief network outage today, which prevented connections to or from various CPDN servers.
                                                                                              The fault developed at approximately 2 pm BST and continued for two hours.
                                                                                              The hardware responsible is due to be replaced imminently, but the project is 'at risk' until that has been done (probably for another 12 hours).

                                                                                              Jonathan Miller
                                                                                              CPDN SysAdmin

                                                                                              </quote>

                                                                                              http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=5447&nowrap=true#44898

                                                                                              25/09/2012 12:51:19 PM | climateprediction.net | Started upload of hadam3p_eu_wi36_1978_1_007215635_2_9.zip
                                                                                              25/09/2012 12:51:22 PM | climateprediction.net | [error] Error reported by file upload server: can't open file
                                                                                              25/09/2012 12:51:22 PM | climateprediction.net | Temporarily failed upload of hadam3p_eu_wi36_1978_1_007215635_2_9.zip: transient upload error
                                                                                              25/09/2012 12:51:22 PM | climateprediction.net | Backing off 3 min 1 sec on upload of hadam3p_eu_wi36_1978_1_007215635_2_9.zip
                                                                                              25/09/2012 12:54:18 PM | climateprediction.net | Started upload of hadam3p_eu_wkar_1963_1_007216497_1_9.zip
                                                                                              25/09/2012 12:54:20 PM | climateprediction.net | [error] Error reported by file upload server: can't open file

                                                                                              Les Bayliss
                                                                                              Forum moderator
                                                                                              Send message
                                                                                              Joined: Sep 5 04
                                                                                              Posts: 5290
                                                                                              Credit: 8,867,520
                                                                                              RAC: 1,315
                                                                                              Message 44903 - Posted 26 Sep 2012 5:12:15 UTC

                                                                                                is there anything we Crunchers should do with our BOINC client ?

                                                                                                Set the project to No new tasks to stop polling for work
                                                                                                Suspend all climate models so as not to add more zips that won't upload.
                                                                                                Set Network activity suspended if possible to completely stop talking to the project.


                                                                                                ____________
                                                                                                Backups: Here

                                                                                                Profile Leprechaun
                                                                                                Send message
                                                                                                Joined: Aug 5 04
                                                                                                Posts: 6
                                                                                                Credit: 554,572
                                                                                                RAC: 0
                                                                                                Message 44904 - Posted 26 Sep 2012 5:28:04 UTC

                                                                                                  No more fun makes it. Constantly there are problems with the servers.
                                                                                                  Also it does not get the team ready, finally, an application GPU to provide.
                                                                                                  Climate was sometimes my favorite project.
                                                                                                  Luckily there are still other scientific projects.
                                                                                                  ____________
                                                                                                  Wiki German Language, Wiki in deutscher Sprache
                                                                                                  View

                                                                                                  Profile Jonathan Miller
                                                                                                  Forum moderator
                                                                                                  Project administrator
                                                                                                  Project developer
                                                                                                  Volunteer developer
                                                                                                  Send message
                                                                                                  Joined: Mar 28 11
                                                                                                  Posts: 35
                                                                                                  Credit: 82,588
                                                                                                  RAC: 0
                                                                                                  Message 44906 - Posted 26 Sep 2012 9:09:45 UTC

                                                                                                    Hi,

                                                                                                    We have issues on all three of our storage servers at the moment.

                                                                                                    Currently Uploader1.atm is full, and the two machines who would normally receive her excess files are suffering from disk issues.

                                                                                                    cpdn-upload2.oerc is one of the machines above, so she cannot currently receive uploads.

                                                                                                    We are waiting on a fix - I suspect it is to do with the network outage that OeRC suffered yesterday afternoon (2 - 4 pm BST, 25 Sept 1012).

                                                                                                    Profile Byron Leigh Hatch @ team Carl Sagan
                                                                                                    Avatar
                                                                                                    Send message
                                                                                                    Joined: Aug 17 04
                                                                                                    Posts: 169
                                                                                                    Credit: 3,902,400
                                                                                                    RAC: 3,174
                                                                                                    Message 44909 - Posted 26 Sep 2012 12:07:24 UTC - in response to Message 44906.




                                                                                                      thanks for the update Jonathan. Best Wishes Byron.

                                                                                                      Profile Byron Leigh Hatch @ team Carl Sagan
                                                                                                      Avatar
                                                                                                      Send message
                                                                                                      Joined: Aug 17 04
                                                                                                      Posts: 169
                                                                                                      Credit: 3,902,400
                                                                                                      RAC: 3,174
                                                                                                      Message 44910 - Posted 26 Sep 2012 12:10:04 UTC - in response to Message 44903.



                                                                                                        Set the project to No new tasks to stop polling for work.
                                                                                                        Done
                                                                                                        Set Network activity suspended if possible to completely stop talking to the project.
                                                                                                        Done
                                                                                                        Suspend all climate models so as not to add more zips that won't upload.

                                                                                                        I'm not sure on how I do this ?
                                                                                                        Could you please provide details on how I do this ?

                                                                                                        Lockleys
                                                                                                        Send message
                                                                                                        Joined: Jan 13 07
                                                                                                        Posts: 123
                                                                                                        Credit: 4,167,771
                                                                                                        RAC: 369
                                                                                                        Message 44911 - Posted 26 Sep 2012 12:58:23 UTC - in response to Message 44910.

                                                                                                          One way would be:
                                                                                                          In BOINC Manager, select Projects tab
                                                                                                          Select climateprediction.net project
                                                                                                          Click Suspend button

                                                                                                          Profile Iain Inglis
                                                                                                          Forum moderator
                                                                                                          Send message
                                                                                                          Joined: Jan 16 10
                                                                                                          Posts: 484
                                                                                                          Credit: 9,532
                                                                                                          RAC: 0
                                                                                                          Message 44912 - Posted 26 Sep 2012 13:02:52 UTC - in response to Message 44910.

                                                                                                            Suspend all climate models so as not to add more zips that won't upload.

                                                                                                            I'm not sure on how I do this ?
                                                                                                            Could you please provide details on how I do this ?

                                                                                                            If you're content to turn off network activity, as you have already done, then there is no need to suspend the models themselves, since the Zip files will simply accumulate until network activity is turned on again. Accumulation of Zip files is not normally a problem, it's 1000's of machines trying and failing to upload them to the affected server that's the problem.

                                                                                                            If, however, you didn't want to turn network activity off because, for example, you are running other projects, then it might be a good idea to suspend the CPDN models in order to stop more Zips being generated and failing to upload. To do that, just select the model in the BOINC Manager 'Tasks' tab and press the 'Suspend button'; or select climateprediction.net in the 'Projects' tab and press the 'Suspend' button. The latter option will stop any CPDN tasks running, which may not be what you want, as it's only the HADAM3P EU models that are having upload problems: my PNW models have cleared without any problems.

                                                                                                            Profile tullio
                                                                                                            Send message
                                                                                                            Joined: Aug 6 04
                                                                                                            Posts: 183
                                                                                                            Credit: 289,711
                                                                                                            RAC: 947
                                                                                                            Message 44913 - Posted 26 Sep 2012 14:25:31 UTC

                                                                                                              I cannot suspend network activity, I have other 6 BOINC projects. I've put NNT.
                                                                                                              Tullio
                                                                                                              ____________

                                                                                                              Profile Byron Leigh Hatch @ team Carl Sagan
                                                                                                              Avatar
                                                                                                              Send message
                                                                                                              Joined: Aug 17 04
                                                                                                              Posts: 169
                                                                                                              Credit: 3,902,400
                                                                                                              RAC: 3,174
                                                                                                              Message 44914 - Posted 26 Sep 2012 16:11:31 UTC - in response to Message 44912.




                                                                                                                Thanks Iain and thanks Lockley for responding to my post.

                                                                                                                but just a few minuets ago:

                                                                                                                it looks like things are back up and running ?

                                                                                                                26/09/2012 5:54:28 AM | climateprediction.net | Sending scheduler request: To send trickle-up message.
                                                                                                                26/09/2012 5:54:28 AM | climateprediction.net | Not reporting or requesting tasks
                                                                                                                26/09/2012 5:54:31 AM | climateprediction.net | Scheduler request completed
                                                                                                                26/09/2012 5:54:34 AM | climateprediction.net | Started upload of hadam3p_eu_wkar_1963_1_007216497_1_11.zip
                                                                                                                26/09/2012 5:56:04 AM | climateprediction.net | Finished upload of hadam3p_eu_wkar_1963_1_007216497_1_11.zip
                                                                                                                26/09/2012 5:56:05 AM | climateprediction.net | Started upload of hadam3p_eu_wi36_1978_1_007215635_2_9.zip
                                                                                                                26/09/2012 5:56:05 AM | climateprediction.net | Started upload of hadam3p_eu_wkar_1963_1_007216497_1_9.zip
                                                                                                                26/09/2012 5:57:39 AM | climateprediction.net | Finished upload of hadam3p_eu_wi36_1978_1_007215635_2_9.zip
                                                                                                                26/09/2012 5:57:39 AM | climateprediction.net | Finished upload of hadam3p_eu_wkar_1963_1_007216497_1_9.zip
                                                                                                                26/09/2012 5:57:39 AM | climateprediction.net | Started upload of hadam3p_eu_wjrj_1991_1_007208963_2_9.zip
                                                                                                                26/09/2012 5:59:06 AM | climateprediction.net | Finished upload of hadam3p_eu_wjrj_1991_1_007208963_2_9.zip
                                                                                                                26/09/2012 6:00:10 AM | climateprediction.net | Started upload of hadam3p_eu_wi36_1978_1_007215635_2_11.zip
                                                                                                                26/09/2012 6:01:43 AM | climateprediction.net | Finished upload of hadam3p_eu_wi36_1978_1_007215635_2_11.zip
                                                                                                                26/09/2012 6:05:44 AM | climateprediction.net | Started upload of hadam3p_eu_wi36_1978_1_007215635_2_10.zip
                                                                                                                26/09/2012 6:07:16 AM | climateprediction.net | Finished upload of hadam3p_eu_wi36_1978_1_007215635_2_10.zip
                                                                                                                26/09/2012 6:29:14 AM | climateprediction.net | Started upload of hadam3p_eu_wjrj_1991_1_007208963_2_11.zip
                                                                                                                26/09/2012 6:30:46 AM | climateprediction.net | Finished upload of hadam3p_eu_wjrj_1991_1_007208963_2_11.zip
                                                                                                                26/09/2012 6:55:10 AM | climateprediction.net | Sending scheduler request: To send trickle-up message.
                                                                                                                26/09/2012 6:55:10 AM | climateprediction.net | Not reporting or requesting tasks
                                                                                                                26/09/2012 6:55:13 AM | climateprediction.net | Scheduler request completed
                                                                                                                26/09/2012 6:56:44 AM | climateprediction.net | Started upload of hadam3p_eu_wkar_1963_1_007216497_1_10.zip
                                                                                                                26/09/2012 6:58:14 AM | climateprediction.net | Finished upload of hadam3p_eu_wkar_1963_1_007216497_1_10.zip
                                                                                                                26/09/2012 7:43:59 AM | climateprediction.net | Started upload of hadam3p_saf_0xoa_1969_1_006876818_2_12.zip
                                                                                                                26/09/2012 7:44:22 AM | climateprediction.net | Finished upload of hadam3p_saf_0xoa_1969_1_006876818_2_12.zip
                                                                                                                26/09/2012 7:53:30 AM | climateprediction.net | Started upload of hadam3p_saf_0xoa_1969_1_006876818_2_13.zip
                                                                                                                26/09/2012 7:53:33 AM | climateprediction.net | Computation for task hadam3p_saf_0xoa_1969_1_006876818_2 finished
                                                                                                                26/09/2012 7:53:33 AM | climateprediction.net | Starting task hadam3p_eu_w4nd_1985_1_007212256_2 using hadam3p_eu version 609 in slot 1
                                                                                                                26/09/2012 7:55:50 AM | climateprediction.net | Sending scheduler request: To send trickle-up message.
                                                                                                                26/09/2012 7:55:50 AM | climateprediction.net | Not reporting or requesting tasks
                                                                                                                26/09/2012 7:55:56 AM | climateprediction.net | Scheduler request completed
                                                                                                                26/09/2012 7:57:05 AM | climateprediction.net | Finished upload of hadam3p_saf_0xoa_1969_1_006876818_2_13.zip
                                                                                                                26/09/2012 8:00:05 AM | climateprediction.net | Started upload of hadam3p_eu_wjrj_1991_1_007208963_2_10.zip
                                                                                                                26/09/2012 8:01:34 AM | climateprediction.net | Finished upload of hadam3p_eu_wjrj_1991_1_007208963_2_10.zip
                                                                                                                26/09/2012 8:03:44 AM | climateprediction.net | Started upload of hadam3p_saf_0z6f_1998_1_006888367_2_12.zip
                                                                                                                26/09/2012 8:04:07 AM | climateprediction.net | Finished upload of hadam3p_saf_0z6f_1998_1_006888367_2_12.zip
                                                                                                                26/09/2012 8:13:08 AM | climateprediction.net | Started upload of hadam3p_saf_0z6f_1998_1_006888367_2_13.zip
                                                                                                                26/09/2012 8:13:12 AM | climateprediction.net | Computation for task hadam3p_saf_0z6f_1998_1_006888367_2 finished
                                                                                                                26/09/2012 8:13:12 AM | climateprediction.net | Starting task hadam3p_pnw_z862_1985_1_006941106_2 using hadam3p_pnw version 609 in slot 3
                                                                                                                26/09/2012 8:17:01 AM | climateprediction.net | Finished upload of hadam3p_saf_0z6f_1998_1_006888367_2_13.zip
                                                                                                                26/09/2012 8:34:43 AM | climateprediction.net | Started upload of hadam3p_saf_13xn_1970_1_006904131_1_12.zip
                                                                                                                26/09/2012 8:35:06 AM | climateprediction.net | Finished upload of hadam3p_saf_13xn_1970_1_006904131_1_12.zip
                                                                                                                26/09/2012 8:44:08 AM | climateprediction.net | Started upload of hadam3p_saf_13xn_1970_1_006904131_1_13.zip
                                                                                                                26/09/2012 8:44:12 AM | climateprediction.net | Computation for task hadam3p_saf_13xn_1970_1_006904131_1 finished
                                                                                                                26/09/2012 8:44:12 AM | climateprediction.net | Starting task hadam3p_saf_110z_1994_1_006890763_1 using hadam3p_saf version 609 in slot 2
                                                                                                                26/09/2012 8:44:39 AM | climateprediction.net | update requested by user
                                                                                                                26/09/2012 8:44:44 AM | climateprediction.net | Sending scheduler request: Requested by user.
                                                                                                                26/09/2012 8:44:44 AM | climateprediction.net | Reporting 2 completed tasks, requesting new tasks for CPU and NVIDIA, sending trickle-up message
                                                                                                                26/09/2012 8:44:46 AM | climateprediction.net | Scheduler request completed: got 0 new tasks
                                                                                                                26/09/2012 8:44:46 AM | climateprediction.net | Project has no tasks available
                                                                                                                26/09/2012 8:47:57 AM | climateprediction.net | Finished upload of hadam3p_saf_13xn_1970_1_006904131_1_13.zip
                                                                                                                26/09/2012 8:52:48 AM | climateprediction.net | update requested by user
                                                                                                                26/09/2012 8:52:49 AM | climateprediction.net | Sending scheduler request: Requested by user.
                                                                                                                26/09/2012 8:52:49 AM | climateprediction.net | Reporting 1 completed tasks, requesting new tasks for CPU and NVIDIA
                                                                                                                26/09/2012 8:52:51 AM | climateprediction.net | Scheduler request completed: got 0 new tasks
                                                                                                                26/09/2012 8:52:51 AM | climateprediction.net | Project has no tasks available



                                                                                                                Profile Dave Jackson
                                                                                                                Send message
                                                                                                                Joined: May 15 09
                                                                                                                Posts: 747
                                                                                                                Credit: 623,678
                                                                                                                RAC: 240
                                                                                                                Message 44915 - Posted 26 Sep 2012 16:51:28 UTC

                                                                                                                  I think Byron, that you have filled up the server again with that lot.

                                                                                                                  Wed 26 Sep 2012 17:25:57 BST | climateprediction.net | [error] Error reported by file upload server: Server is out of disk space
                                                                                                                  Wed 26 Sep 2012 17:25:57 BST | climateprediction.net | Temporarily failed upload of hadam3p_eu_2qf2_1971_1_008173014_1_12.zip: transient upload error
                                                                                                                  Wed 26 Sep 2012 17:25:57 BST | climateprediction.net | Backing off 5 hr 43 min 9 sec on upload of hadam3p_eu_2qf2_1971_1_008173014_1_12.zip
                                                                                                                  Wed 26 Sep 2012 17:26:05 BST | climateprediction.net | Started upload of hadam3p_eu_2kj0_1962_1_008189170_0_3.zip
                                                                                                                  Wed 26 Sep 2012 17:26:06 BST | climateprediction.net | [error] Error reported by file upload server: Server is out of disk space


                                                                                                                  Dave

                                                                                                                  ggrinton
                                                                                                                  Send message
                                                                                                                  Joined: Jan 24 06
                                                                                                                  Posts: 5
                                                                                                                  Credit: 371,295
                                                                                                                  RAC: 400
                                                                                                                  Message 44918 - Posted 28 Sep 2012 10:14:13 UTC - in response to Message 44915.

                                                                                                                    I am still getting these messages. Any word on when it might be resolved?
                                                                                                                    ____________

                                                                                                                    Profile Iain Inglis
                                                                                                                    Forum moderator
                                                                                                                    Send message
                                                                                                                    Joined: Jan 16 10
                                                                                                                    Posts: 484
                                                                                                                    Credit: 9,532
                                                                                                                    RAC: 0
                                                                                                                    Message 44922 - Posted 28 Sep 2012 12:30:51 UTC - in response to Message 44918.

                                                                                                                      I am still getting these messages. Any word on when it might be resolved?
                                                                                                                      There is a problem with the server to which the data would normally be moved. No doubt when that problem is fixed the moving process will resume.

                                                                                                                      Profile PatrickProject donor
                                                                                                                      Send message
                                                                                                                      Joined: Sep 8 10
                                                                                                                      Posts: 6
                                                                                                                      Credit: 1,051,233
                                                                                                                      RAC: 261
                                                                                                                      Message 44925 - Posted 28 Sep 2012 19:48:13 UTC

                                                                                                                        I notice that some models are able to upload. My hadcm3n's seem to be uploading fine. From the 'other' board I think I read that pnw's also upload because they're going directly to sever at the Univ of WA where the project is located.

                                                                                                                        Eu mmodels, on the other hand, are completely backed up. I have 16 such files currently in the queue. However they're only 13 MB a piece; I have plenty of disk space; so I'm going to let those models continue to run.

                                                                                                                        CPDN seems clearly to be a 'set it and forget it' project. Where the contradiction comes in is that, on average, the people participating in the project are technical and it's natural that many of them would want to know more of what's going on. Of course, we do know that CPDN is chronically short-handed.

                                                                                                                        Even though I've tried to keep these remarks 'neutral', I expect someone will find something to take issue with. Such is human nature.

                                                                                                                        Profile Dave Jackson
                                                                                                                        Send message
                                                                                                                        Joined: May 15 09
                                                                                                                        Posts: 747
                                                                                                                        Credit: 623,678
                                                                                                                        RAC: 240
                                                                                                                        Message 44926 - Posted 29 Sep 2012 7:16:57 UTC - in response to Message 44925.

                                                                                                                          I notice that cpdnupload2.oerc is red now. That may mean it has been taken off line while the data is transferred however that in itself will take a while as it is several TB.

                                                                                                                          Note I am not suggesting they buy some as what little I know about how things are set up at Oxford is from my reading here but I saw on Tom's Hardware the other day that someone is now selling a reasonably speedy 4TB drive.

                                                                                                                          While it might not mean it at Oxford, it would certainly solve all my space problems for a while if it were a bit cheaper.

                                                                                                                          Profile Byron Leigh Hatch @ team Carl Sagan
                                                                                                                          Avatar
                                                                                                                          Send message
                                                                                                                          Joined: Aug 17 04
                                                                                                                          Posts: 169
                                                                                                                          Credit: 3,902,400
                                                                                                                          RAC: 3,174
                                                                                                                          Message 44927 - Posted 29 Sep 2012 21:56:49 UTC

                                                                                                                            Just reporting some good news. zip files seem to be uploading.

                                                                                                                            Profile [B@H] Ray
                                                                                                                            Avatar
                                                                                                                            Send message
                                                                                                                            Joined: Aug 19 05
                                                                                                                            Posts: 103
                                                                                                                            Credit: 1,738,895
                                                                                                                            RAC: 29
                                                                                                                            Message 44928 - Posted 30 Sep 2012 1:06:16 UTC

                                                                                                                              For my units the PNW units are uploading good, the EU units have been just setting here. One system is working on it's last model, hope there is new work this week.
                                                                                                                              ____________
                                                                                                                              Keep on crunching Pizza@Home

                                                                                                                              Les Bayliss
                                                                                                                              Forum moderator
                                                                                                                              Send message
                                                                                                                              Joined: Sep 5 04
                                                                                                                              Posts: 5290
                                                                                                                              Credit: 8,867,520
                                                                                                                              RAC: 1,315
                                                                                                                              Message 44929 - Posted 30 Sep 2012 4:10:10 UTC - in response to Message 44928.

                                                                                                                                PNW goes directly to Uni of Oregon, USA, so don't count.

                                                                                                                                New work won't even be considered until ALL of the server problems are sorted, which may be another week yet.

                                                                                                                                Michaelmas Term starts in a weeks time. or thereabouts, so Long Vacation will finish in a few days, and all of the IT people who scarpered as soon as it started should be back soon, and dealing with problems in their various areas.


                                                                                                                                ____________
                                                                                                                                Backups: Here

                                                                                                                                Profile [AF>Le_Pommier] Jerome_C2005
                                                                                                                                Send message
                                                                                                                                Joined: Oct 21 10
                                                                                                                                Posts: 22
                                                                                                                                Credit: 774,572
                                                                                                                                RAC: 690
                                                                                                                                Message 44931 - Posted 30 Sep 2012 10:39:14 UTC

                                                                                                                                  Last modified: 30 Sep 2012 10:52:44 UTC

                                                                                                                                  Just when I was about to write "it's stuck with me too" it started to upload again :)

                                                                                                                                  edit : well then it gets stuck again, then it restarts again... so I guess we'll have to wait for the return of the Jedi...

                                                                                                                                  Profile JIM
                                                                                                                                  Send message
                                                                                                                                  Joined: Dec 31 07
                                                                                                                                  Posts: 665
                                                                                                                                  Credit: 3,883,239
                                                                                                                                  RAC: 1,869
                                                                                                                                  Message 44934 - Posted 30 Sep 2012 16:17:31 UTC

                                                                                                                                    I don’t believe that there is much that you can do to speed this up. The only real solution is to wait for the server problems to be fixed. You might suspend network activity so that the stuck zip file doesn’t keep trying to upload. If you are running other types of WU’s or other Boinc projects you can reenable network activity about once a day to let other types to upload and then resuspend.

                                                                                                                                    ____________

                                                                                                                                    Les Bayliss
                                                                                                                                    Forum moderator
                                                                                                                                    Send message
                                                                                                                                    Joined: Sep 5 04
                                                                                                                                    Posts: 5290
                                                                                                                                    Credit: 8,867,520
                                                                                                                                    RAC: 1,315
                                                                                                                                    Message 44935 - Posted 30 Sep 2012 20:11:10 UTC

                                                                                                                                      To: J. Patrick Malone

                                                                                                                                      I've hidden your post to stop spammers from getting your email address.

                                                                                                                                      As for mailing results back to the project, this isn't how BOINC projects work.
                                                                                                                                      You'll just have to wait patiently like all of us.
                                                                                                                                      If you read back through this thread, you'll find one of my earlier posts, where I listed the only steps that can be taken.


                                                                                                                                      ____________
                                                                                                                                      Backups: Here

                                                                                                                                      Profile PatrickProject donor
                                                                                                                                      Send message
                                                                                                                                      Joined: Sep 8 10
                                                                                                                                      Posts: 6
                                                                                                                                      Credit: 1,051,233
                                                                                                                                      RAC: 261
                                                                                                                                      Message 44995 - Posted 2 Oct 2012 19:15:49 UTC

                                                                                                                                        FWIW, my long queue of EU model uploads has decreased and some of the uploads are now getting through.

                                                                                                                                        pioneer1
                                                                                                                                        Send message
                                                                                                                                        Joined: May 16 07
                                                                                                                                        Posts: 10
                                                                                                                                        Credit: 776,387
                                                                                                                                        RAC: 410
                                                                                                                                        Message 44999 - Posted 3 Oct 2012 9:09:11 UTC

                                                                                                                                          Here we go again :(

                                                                                                                                          03/10/2012 12:01:08 | climateprediction.net | [error] Error reported by file upload server: can't write file /storage/incoming/uploader//hadam3p_eu_2r82_1972_1_008189180_0_7.zip: No space left on server
                                                                                                                                          03/10/2012 12:01:08 | climateprediction.net | Temporarily failed upload of hadam3p_eu_2r82_1972_1_008189180_0_7.zip: transient upload error
                                                                                                                                          03/10/2012 12:01:08 | climateprediction.net | Backing off 9 hr 1 min 29 sec on upload of hadam3p_eu_2r82_1972_1_008189180_0_7.zip

                                                                                                                                          Les Bayliss
                                                                                                                                          Forum moderator
                                                                                                                                          Send message
                                                                                                                                          Joined: Sep 5 04
                                                                                                                                          Posts: 5290
                                                                                                                                          Credit: 8,867,520
                                                                                                                                          RAC: 1,315
                                                                                                                                          Message 45000 - Posted 3 Oct 2012 9:36:56 UTC - in response to Message 44999.

                                                                                                                                            It's more a matter of "still" rather than "again".

                                                                                                                                            Have you read the News thread?
                                                                                                                                            It could be next week before the bulk of the uploads get through.


                                                                                                                                            ____________
                                                                                                                                            Backups: Here

                                                                                                                                            pioneer1
                                                                                                                                            Send message
                                                                                                                                            Joined: May 16 07
                                                                                                                                            Posts: 10
                                                                                                                                            Credit: 776,387
                                                                                                                                            RAC: 410
                                                                                                                                            Message 45001 - Posted 3 Oct 2012 9:49:47 UTC - in response to Message 45000.


                                                                                                                                              But all my uploads & of others went through so I thought the problem(s) were fixed that's why the "again". Never mind.

                                                                                                                                              And, of course, I've read both the News and Announcements plus the other threads (Uploads not working, Server out of disk space,...) not to mention my own topic "Permanent HTTP Error".

                                                                                                                                              Profile Dave Jackson
                                                                                                                                              Send message
                                                                                                                                              Joined: May 15 09
                                                                                                                                              Posts: 747
                                                                                                                                              Credit: 623,678
                                                                                                                                              RAC: 240
                                                                                                                                              Message 45002 - Posted 3 Oct 2012 10:12:29 UTC - in response to Message 45001.

                                                                                                                                                If everyone were to pick a day of the week and a time to enable internet activity, it would reduce the load on the server after outages. Even if quite a few people chose the same day, it would reduce the hammering when first back on line. Perhaps the information where people sign up should suggest this? I know it is nice to look at stats and see how you are doing but I am sure most of us could cope with getting our fix once a week rather than several times a day?...........

                                                                                                                                                nedsram-cdl
                                                                                                                                                Send message
                                                                                                                                                Joined: Apr 14 05
                                                                                                                                                Posts: 19
                                                                                                                                                Credit: 9,258,079
                                                                                                                                                RAC: 2,988
                                                                                                                                                Message 45003 - Posted 3 Oct 2012 12:16:49 UTC - in response to Message 45002.

                                                                                                                                                  Possibly, but the point is that this issue has been present for some time. I currently have 20 eu zip files unable to upload.

                                                                                                                                                  Hopefully we will be told when this has been resolved, on the news thread - although they did say "next week" about a week ago...
                                                                                                                                                  ____________
                                                                                                                                                  Brian

                                                                                                                                                  Profile Byron Leigh Hatch @ team Carl Sagan
                                                                                                                                                  Avatar
                                                                                                                                                  Send message
                                                                                                                                                  Joined: Aug 17 04
                                                                                                                                                  Posts: 169
                                                                                                                                                  Credit: 3,902,400
                                                                                                                                                  RAC: 3,174
                                                                                                                                                  Message 45005 - Posted 3 Oct 2012 12:38:21 UTC

                                                                                                                                                    Just a few minuets ago while reading this thread, my 24 eu zip files are starting to upload. Yay!

                                                                                                                                                    Profile Byron Leigh Hatch @ team Carl Sagan
                                                                                                                                                    Avatar
                                                                                                                                                    Send message
                                                                                                                                                    Joined: Aug 17 04
                                                                                                                                                    Posts: 169
                                                                                                                                                    Credit: 3,902,400
                                                                                                                                                    RAC: 3,174
                                                                                                                                                    Message 45007 - Posted 3 Oct 2012 12:58:45 UTC

                                                                                                                                                      Just reporting some good news. all my 24 eu zip files have now uploaded at 175 kbps. Well done to the team @ Oxford! and thank Jonathan Miller CPDN SysAdmin

                                                                                                                                                      Les Bayliss
                                                                                                                                                      Forum moderator
                                                                                                                                                      Send message
                                                                                                                                                      Joined: Sep 5 04
                                                                                                                                                      Posts: 5290
                                                                                                                                                      Credit: 8,867,520
                                                                                                                                                      RAC: 1,315
                                                                                                                                                      Message 45011 - Posted 3 Oct 2012 22:33:26 UTC

                                                                                                                                                        Last modified: 4 Oct 2012 1:45:13 UTC

                                                                                                                                                        I managed to get my remaining 16 EUs to upload 'overnight'.
                                                                                                                                                        So it's not fixed yet, just "getting there".

                                                                                                                                                        Data is still being moved off a couple of servers to storage, but more is coming in just as fast.
                                                                                                                                                        I've been watching this in the messages on one of my computers, as 16 files slowly uploaded.

                                                                                                                                                        According to the Status page yesterday, there were over 135,000 tasks running, and now it says 127,477, so it's coming down.

                                                                                                                                                        Just thinking out loud, if only a quarter of those "running" were due to pending uploads, and each one only had a quarter of their files waiting, that's about 90,000 zips fighting each other for disk space.

                                                                                                                                                        It must be somewhat like a person running an ultra-marathon through vast swarms of stampeding elephants, rinos and wildebeests, while juggling a dozen sharp knives.

                                                                                                                                                        There's been more hardware and software failures since the weekend, but Jonathan and Andy have their eyes on things.
                                                                                                                                                        ____________
                                                                                                                                                        Backups: Here

                                                                                                                                                        Profile Byron Leigh Hatch @ team Carl Sagan
                                                                                                                                                        Avatar
                                                                                                                                                        Send message
                                                                                                                                                        Joined: Aug 17 04
                                                                                                                                                        Posts: 169
                                                                                                                                                        Credit: 3,902,400
                                                                                                                                                        RAC: 3,174
                                                                                                                                                        Message 45013 - Posted 3 Oct 2012 23:25:33 UTC - in response to Message 45011.

                                                                                                                                                          Les thank you for your post, like you say we're not out of woods yet,
                                                                                                                                                          there could be more bumps in the road ahead.
                                                                                                                                                          best wishes to the team @ Oxford!
                                                                                                                                                          and thank you again to Andy and Jonathan, CPDN SysAdmins for a job well done!

                                                                                                                                                          Byron

                                                                                                                                                          JimMcCarthy_StellarSolns
                                                                                                                                                          Avatar
                                                                                                                                                          Send message
                                                                                                                                                          Joined: Sep 3 08
                                                                                                                                                          Posts: 23
                                                                                                                                                          Credit: 10,986,812
                                                                                                                                                          RAC: 12,766
                                                                                                                                                          Message 45014 - Posted 4 Oct 2012 0:51:08 UTC

                                                                                                                                                            Last modified: 4 Oct 2012 0:51:35 UTC

                                                                                                                                                            Yes, thank you Les and others who work tirelessly to keep everything up and running.

                                                                                                                                                            I just wanted to add an 'FYI' that the uploader server issues again seem to be interfering with attempts to download work from the 'reference site' onto a windows machine which I've just set up for CPDN number crunching. Searching the forum archives, my symptoms are the same as those experienced and explained in message 44708 (http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=7442&nowrap=true#44708). Per the advice given there, I'll just remain patient while everything is returned to normal.

                                                                                                                                                            Thanks again,

                                                                                                                                                            -- Jim

                                                                                                                                                            Profile Dave Jackson
                                                                                                                                                            Send message
                                                                                                                                                            Joined: May 15 09
                                                                                                                                                            Posts: 747
                                                                                                                                                            Credit: 623,678
                                                                                                                                                            RAC: 240
                                                                                                                                                            Message 45018 - Posted 4 Oct 2012 6:32:14 UTC - in response to Message 45011.

                                                                                                                                                              According to the Status page yesterday, there were over 135,000 tasks running,


                                                                                                                                                              My main machine has two tasks running but four more in the queue. As these are listed as being in progress when i look at the computer's page, I presume that those 135,000 include those queued on machines but not yet started? A trivial point, I know given the problems with hardware and software etc but it piqued my curiosity and I wondered how many tasks are actually, "in progress" My other linux machine doesn't have any in the queue at the moment so my own average would be half of those listed. I will leave it to someone else with more machines to work out something with more statistical validity!

                                                                                                                                                              Eirik Redd
                                                                                                                                                              Send message
                                                                                                                                                              Joined: Aug 31 04
                                                                                                                                                              Posts: 242
                                                                                                                                                              Credit: 26,472,483
                                                                                                                                                              RAC: 20,534
                                                                                                                                                              Message 45019 - Posted 4 Oct 2012 7:40:28 UTC - in response to Message 45018.

                                                                                                                                                                According to the Status page yesterday, there were over 135,000 tasks running,


                                                                                                                                                                My main machine has two tasks running but four more in the queue. As these are listed as being in progress when i look at the computer's page, I presume that those 135,000 include those queued on machines but not yet started? A trivial point, I know given the problems with hardware and software etc but it piqued my curiosity and I wondered how many tasks are actually, "in progress" My other linux machine doesn't have any in the queue at the moment so my own average would be half of those listed. I will leave it to someone else with more machines to work out something with more statistical validity!



                                                                                                                                                                The 135,000 number is inaccurate for at least two reasons. First, as you noted, some indeterminate number of those are waiting "Ready to start" on somebody's host(s). Another indeterminate number have downloaded to hosts that will never finish the task(s).

                                                                                                                                                                I have 6 machines running 24/7 - 3 of them are somewhat fast. There's very little in the queue "Ready to start" but a whole lot pending upload. I am only letting each machine go online once a day to update stats and trickle up and download my preferred wu's from other projects. I let only one of them at a time stay online for a day (intil its upload queue clears, then I leave it online) -- that means one is network enabled each day until its upload queue clears - it will take a few days to finish the 80+ uploads (per fast host) that are still pending. My slower hosts are all caught up and online for new work. Running on a slowish DSL. Expect uploads to catch up within a couple of days. Getting downloads from time to time (probably old wu's that timed out and got resubmitted automatically)

                                                                                                                                                                Figuring an overall reduction factor to adjust the supposed 130,000 tasks "out there" would be real difficult. I have prefs set to start downloading when any task is withing 28 hours of completing. So there's less than 25% ratio "Ready to start" versus "Running" here. I still have my 3 faster hosts that have more tasks "uploading" than they have "running" It will be a while - and I'm not going to push my uploads because there's lots of other people with worse network than what I have and I'm not going to do anything to overload the fragile servers.
                                                                                                                                                                ____________

                                                                                                                                                                Les Bayliss
                                                                                                                                                                Forum moderator
                                                                                                                                                                Send message
                                                                                                                                                                Joined: Sep 5 04
                                                                                                                                                                Posts: 5290
                                                                                                                                                                Credit: 8,867,520
                                                                                                                                                                RAC: 1,315
                                                                                                                                                                Message 45021 - Posted 4 Oct 2012 8:08:22 UTC

                                                                                                                                                                  Last modified: 4 Oct 2012 8:09:17 UTC

                                                                                                                                                                  OK, my mistake. The label is actually Tasks in progress

                                                                                                                                                                  This is an abbreviation for: I've sent this number of work units to client computers, and they aren't yet on my work list as being completed or failed. Therefore they're still out there somewhere..

                                                                                                                                                                  As for new work that's occasionally being received, that's due to the resubmission script being fired up to slowly produce new data sets in the sequence of that past work that has been returned intact.
                                                                                                                                                                  ____________
                                                                                                                                                                  Backups: Here

                                                                                                                                                                  Post to thread

                                                                                                                                                                  Message boards : Number crunching : Uploads not working




                                                                                                                                                                  Copyright © 2002-2014 climateprediction.net