Uploads not working

Author	Message
Nuadormrac Send message Joined: 14 Oct 05 Posts: 44 Credit: 2,868,973 RAC: 1,726	Message 44773 - Posted: 31 Aug 2012, 4:42:49 UTC Getting the following: 8/31/2012 12:31:24 AM \| climateprediction.net \| Started upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip 8/31/2012 12:31:26 AM \| climateprediction.net \| Temporarily failed upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip: transient HTTP error 8/31/2012 12:31:26 AM \| climateprediction.net \| Backing off 3 min 9 sec on upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip 8/31/2012 12:33:33 AM \| climateprediction.net \| Started upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip 8/31/2012 12:33:35 AM \| climateprediction.net \| Temporarily failed upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip: transient HTTP error 8/31/2012 12:33:35 AM \| climateprediction.net \| Backing off 6 min 7 sec on upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip 8/31/2012 12:33:40 AM \| climateprediction.net \| Started upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip 8/31/2012 12:33:41 AM \| climateprediction.net \| Temporarily failed upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip: transient HTTP error 8/31/2012 12:33:41 AM \| climateprediction.net \| Backing off 8 min 25 sec on upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip Checking the server status page, one of the upload servers shows as not running, the other 2 upload servers are up. Things are trickeling, but data can't upload.... ID: 44773 · Reply Quote

Eirik Redd Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,888,554 RAC: 1,481,373	Message 44778 - Posted: 31 Aug 2012, 9:25:26 UTC - in response to Message 44773. Getting similar errors but not many waiting uploads so far. Server status page shows "uploader1.atm" as down. Staff probably aware already - it's already after 9AM in the prime time zone. ID: 44778 · Reply Quote

Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 943 Credit: 34,360,365 RAC: 9,337	Message 44783 - Posted: 31 Aug 2012, 11:38:44 UTC Information from staff: The hard disk running the operating system on uploader1.atm has failed and needs to be replaced. We have ordered a new disk which will arrive on Monday and be installed on that day. So at the moment this machine is shut down and won't be up-and-running until Monday, I am afraid. That will affect, at least, the intermediate (_1 to _12) file uploads for EU regional models, possibly others too. ID: 44783 · Reply Quote

Eirik Redd Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,888,554 RAC: 1,481,373	Message 44791 - Posted: 1 Sep 2012, 13:18:03 UTC - in response to Message 44783. "The hard disk running the Operating System" WTF? This is one of the looniest postings I've ever seen here. Any serious server installation has at least a mirror of the OS for backup or alternative boot and OS on whatever of several physical drives -- whether IBM mainframe or my local mini-cluster or the cloud we are all expected to trust, or a lousy backup boot partition on Linux. "The hard disk running the OS" what could that hard disk possibly be? Are we trusting all this compute power to the power of the "C: drive" And how would replacing the bare disk fix the loss of the OS -- Sorry for the rant, but the explanation makes no sense whatsoever at all - and makes the support team there look like total idiots - which I know they are not. Yes - the compete explanation would cover a lot of techie stuff that would bore most of us to tears -- but the nonsensical explanation posted is -so -dumb. Me -- sometimes the project has problems - as far as I can see the problems get fixed within a week -- no data ever lost. Last 6 years or so. I keep on contributing -- no regrets. But "need an OS disk to keep running" - Sorry about that but is so idiotic -- could have been a totally uninformed politician posting that. Please don't BS us who contribute. Maybe - "the team waits for hardware to fix the problem" might be plausible -- "Need an OS disk" obviously makes fools of us all. In any case- keep on crunching - the crew have done wonders - and keep on doing so -- \ But - nonsensical pretend explanations of problems are losers in the long run. ID: 44791 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4353 Credit: 16,598,247 RAC: 6,156	Message 44793 - Posted: 1 Sep 2012, 16:03:31 UTC Assuming a raid system, if one of the hard disks had failed, it might well shut down as a precaution, if the second disk in a 2 disk raid system also went that might cause data loss so they would be awaiting a new disk to rebuild the array. I have never used raid, just been rather paranoid about backing up important stuff so this is purely based on my reading not experience lol. Dave ID: 44793 · Reply Quote

old_user671679 Send message Joined: 30 Jan 12 Posts: 38 Credit: 10,197,388 RAC: 0	Message 44794 - Posted: 2 Sep 2012, 0:34:26 UTC Now the South African download server is down, why doesn't that surprise me? The techs at Oxford could care less about this project. The whole worlds watching them, I hope they never put it on their r�sum�. ID: 44794 · Reply Quote

Eirik Redd Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,888,554 RAC: 1,481,373	Message 44795 - Posted: 2 Sep 2012, 5:04:25 UTC - in response to Message 44794. Actually, I believe that the techs on this project are doing a very good job. The limited funding for the research puts them in a position where they can't have what most of us "techies" just assume is normal. They have to do the best they can with what they've got, and that's not a lot. Mirrored drives for the OS - we see that's not true. Spare disc drives just laying around or online already waiting for a problem - obviously not so. Redundant SAN with no SPOF anywhere and automatic failover to a backup system - at least a year or two worth of storage waiting on-line already? Don't think so. Maintenance contract with (big database company that will fix any problems in 24 hours provided that you have enough spare backup hardware pre-certified?) Heh- all that could be fixed with less than 25 million euros - rough guess. Maybe 50. (not counting the service contracts with the vendors) The tech support at the project are supporting - not only the hardware - but more important and invisible to us volunteers - they are supporting the access to the work we have done - the database - for researchers worldwide. Understaffed, overworked, with more job demands than anything I ever did as a techie. (Hardware, software, database, application expertise - that would be at least 8 FTEs at even the cheapest shop I ever worked in) My earlier rant about the ongoing problems with servers should be interpreted as me venting my frustration with the whole situation - NOT as an accusation of the understaffed and underfunded crew. ID: 44795 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4353 Credit: 16,598,247 RAC: 6,156	Message 44796 - Posted: 2 Sep 2012, 6:28:38 UTC - in response to Message 44795. Totally agree Erik! Two Techies there to do the job. If they had your estimate of eight and they were the same quality as those they have and those eight had the money to buy the hardware they wanted ........ I don't think we would see many of the problems we do....... Or maybe they would just try and do 4 times as much, succeed and still get as many complaints? Dave ID: 44796 · Reply Quote

Eirik Redd Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,888,554 RAC: 1,481,373	Message 44807 - Posted: 4 Sep 2012, 12:04:46 UTC - in response to Message 44796. Last modified: 4 Sep 2012, 12:25:34 UTC Uploads are working slowly - expect will catch up next 3-4 hours. Thx Dave - yeah volunteer here a few years the temporary failures of hardware are annoying but no big deal - wait a few days or week at worst and all the work gets uploaded and distributed eventually. Nothing ever lost. Once happened that a misconfig and load of crap wu's got my goat by wasting my limited bandwidth , that was a while ago. Main point is - most contributors never notice a week's downtime on the upload server. Last time I looked the "top -- whatever" - computers - they were wasting wu's a mile a minute - So - thanks - let's keep the osmolality of the effluent minimal when we post here, and keep on crunching -- it's worth doing. Apologize for any flaming I've done. And - to all - complain, bitch and worry -- if there's ever a problem -- it might be an old moldy problem - but it might be a new problem - and reporting such a problem might very well save all of us volunteers a lot of wasted effort - So - If you read this board - all complaints are welcome !! :):) - the Mods welcome the chance to help all problems !! :): Actually, they do help a lot -- thanks PS - I am not MOD, never will be, but thanks to them all ID: 44807 · Reply Quote

Bob Send message Joined: 20 Dec 04 Posts: 6 Credit: 4,055,041 RAC: 0	Message 44811 - Posted: 7 Sep 2012, 5:44:03 UTC 7 Sept 2012, 05:36 UTC; upload disk full error message started to appear 6 Sept 2012 at 22:23 UTC Server status page indicates server is up and running Just thought you would like to know. ID: 44811 · Reply Quote

Eirik Redd Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,888,554 RAC: 1,481,373	Message 44812 - Posted: 7 Sep 2012, 8:04:27 UTC - in response to Message 44811. Thanks. Confirming what you reported. Same here. ID: 44812 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4353 Credit: 16,598,247 RAC: 6,156	Message 44813 - Posted: 7 Sep 2012, 8:09:49 UTC - in response to Message 44812. I am getting the same on an eu model. saf model which goes to a different server is fine. They should be starting work about now in Oxford so I assume we will see some action this morning. Dave ID: 44813 · Reply Quote

ggrinton Send message Joined: 24 Jan 06 Posts: 5 Credit: 435,756 RAC: 0	Message 44814 - Posted: 7 Sep 2012, 10:10:07 UTC - in response to Message 44813. Confirming that I am also getting upload failures repeatedly. In itself that does not worry me, but it does chew through my upload quota at a great rate. Is tehre any way to disable the upload for a while? (This is a completed task, and I have other tasks running, so I do not want to just disable network traffic.) ID: 44814 · Reply Quote

Eirik Redd Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,888,554 RAC: 1,481,373	Message 44818 - Posted: 7 Sep 2012, 10:31:58 UTC - in response to Message 44814. Last modified: 7 Sep 2012, 10:51:58 UTC Confirming that I am also getting upload failures repeatedly. In itself that does not worry me, but it does chew through my upload quota at a great rate. Is tehre any way to disable the upload for a while? (This is a completed task, and I have other tasks running, so I do not want to just disable network traffic.) You could "disable network activity" on one of the tabs in the manager -- BUT -- seems that uploads are working again, so try that option later. OH gorgonzola and other cheeses -- so overwhelmed with backlog uploads now -- just wait a few hours. ID: 44818 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4353 Credit: 16,598,247 RAC: 6,156	Message 44819 - Posted: 7 Sep 2012, 13:28:31 UTC - in response to Message 44818. Just to confirm that an eu zip file went through at 10:54 on one machine and two more have gone through since so issue seems resolved apart from my curiosity - in the past when the disk has filled up it has taken several hours to transfer the data before the disk has come back on line again. Seems suspiciously quick for it to have really filled up. Dave ID: 44819 · Reply Quote

transient Send message Joined: 3 Oct 06 Posts: 43 Credit: 8,017,057 RAC: 0	Message 44820 - Posted: 7 Sep 2012, 15:50:13 UTC could redirecting the url for the uploadhandler in the hosts file to say 127.0.0.0 be an option? ID: 44820 · Reply Quote

Eirik Redd Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,888,554 RAC: 1,481,373	Message 44832 - Posted: 14 Sep 2012, 14:39:10 UTC Problems with uploader1 both up and down . Friday of course. ID: 44832 · Reply Quote

geophi Volunteer moderator Send message Joined: 7 Aug 04 Posts: 2169 Credit: 64,555,907 RAC: 5,858	Message 44834 - Posted: 14 Sep 2012, 21:24:08 UTC - in response to Message 44832. Problems with uploader1 both up and down . Friday of course. I let the project people know, but like you say it's Friday. Hopefully it'll get fixed early next week. ID: 44834 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4353 Credit: 16,598,247 RAC: 6,156	Message 44837 - Posted: 17 Sep 2012, 13:47:08 UTC - in response to Message 44834. My three waiting uploads have all gone, however the server keeps going back to red every so often on the server status page. Dave. ID: 44837 · Reply Quote

Eirik Redd Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,888,554 RAC: 1,481,373	Message 44838 - Posted: 17 Sep 2012, 14:51:16 UTC - in response to Message 44837. Yup - the server goes on and off. Has uploaded a few dozen files from here. All what I worry about is if the uploads get lost - however many days it takes to get the job done is not a problem. Losing data is the possible problem - but that has never happened as far as I know - long delays happen when server is catching up. I run 6 machines - right now 3 have network disabled - the other 3 are uploading slowly from time to time. Won't enable network for the other 3 until the online ones clear their queues. Might be a while. The important thing is not to lose the uploads. Patience is a virtue. ID: 44838 · Reply Quote