climateprediction.net home page
Uploads not working

Uploads not working

Message boards : Number crunching : Uploads not working
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Nuadormrac
Avatar

Send message
Joined: 14 Oct 05
Posts: 44
Credit: 2,767,024
RAC: 7,001
Message 44773 - Posted: 31 Aug 2012, 4:42:49 UTC

Getting the following:

8/31/2012 12:31:24 AM | climateprediction.net | Started upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip
8/31/2012 12:31:26 AM | climateprediction.net | Temporarily failed upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip: transient HTTP error
8/31/2012 12:31:26 AM | climateprediction.net | Backing off 3 min 9 sec on upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip
8/31/2012 12:33:33 AM | climateprediction.net | Started upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip
8/31/2012 12:33:35 AM | climateprediction.net | Temporarily failed upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip: transient HTTP error
8/31/2012 12:33:35 AM | climateprediction.net | Backing off 6 min 7 sec on upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip
8/31/2012 12:33:40 AM | climateprediction.net | Started upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip
8/31/2012 12:33:41 AM | climateprediction.net | Temporarily failed upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip: transient HTTP error
8/31/2012 12:33:41 AM | climateprediction.net | Backing off 8 min 25 sec on upload of hadam3p_eu_97xa_1969_1_008158384_0_1.zip


Checking the server status page, one of the upload servers shows as not running, the other 2 upload servers are up. Things are trickeling, but data can't upload....
ID: 44773 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 44778 - Posted: 31 Aug 2012, 9:25:26 UTC - in response to Message 44773.  

Getting similar errors but not many waiting uploads so far.
Server status page shows "uploader1.atm" as down.
Staff probably aware already - it's already after 9AM in the prime time zone.
ID: 44778 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 921
Credit: 34,100,818
RAC: 11,270
Message 44783 - Posted: 31 Aug 2012, 11:38:44 UTC

Information from staff:
The hard disk running the operating system on uploader1.atm has failed and needs to be replaced. We have ordered a new disk which will arrive on Monday and be installed on that day. So at the moment this machine is shut down and won't be up-and-running until Monday, I am afraid.

That will affect, at least, the intermediate (_1 to _12) file uploads for EU regional models, possibly others too.
ID: 44783 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 44791 - Posted: 1 Sep 2012, 13:18:03 UTC - in response to Message 44783.  

"The hard disk running the Operating System" WTF?

This is one of the looniest postings I've ever seen here.

Any serious server installation has at least a mirror of the OS for backup or alternative boot and OS on whatever of several physical drives -- whether IBM mainframe or my local mini-cluster or the cloud we are all expected to trust, or a lousy backup boot partition on Linux.
"The hard disk running the OS" what could that hard disk possibly be?
Are we trusting all this compute power to the power of the "C: drive"

And how would replacing the bare disk fix the loss of the OS --

Sorry for the rant, but the explanation makes no sense whatsoever at all - and makes the support team there look like total idiots - which I know they are not.

Yes - the compete explanation would cover a lot of techie stuff that would bore most of us to tears -- but the nonsensical explanation posted is -so -dumb.

Me -- sometimes the project has problems - as far as I can see the problems get fixed within a week -- no data ever lost. Last 6 years or so. I keep on contributing -- no regrets.

But "need an OS disk to keep running" - Sorry about that but is so idiotic -- could have been a totally uninformed politician posting that.

Please don't BS us who contribute.

Maybe - "the team waits for hardware to fix the problem"

might be plausible --

"Need an OS disk" obviously makes fools of us all.

In any case- keep on crunching - the crew have done wonders - and keep on doing so --
\
But - nonsensical pretend explanations of problems are losers in the long run.

ID: 44791 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4309
Credit: 16,355,267
RAC: 6,278
Message 44793 - Posted: 1 Sep 2012, 16:03:31 UTC

Assuming a raid system, if one of the hard disks had failed, it might well shut down as a precaution, if the second disk in a 2 disk raid system also went that might cause data loss so they would be awaiting a new disk to rebuild the array.

I have never used raid, just been rather paranoid about backing up important stuff so this is purely based on my reading not experience lol.

Dave
ID: 44793 · Report as offensive     Reply Quote
old_user671679

Send message
Joined: 30 Jan 12
Posts: 38
Credit: 10,197,388
RAC: 0
Message 44794 - Posted: 2 Sep 2012, 0:34:26 UTC

Now the South African download server is down, why doesn't that surprise me? The techs at Oxford could care less about this project. The whole worlds watching them, I hope they never put it on their r�sum�.
ID: 44794 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 44795 - Posted: 2 Sep 2012, 5:04:25 UTC - in response to Message 44794.  

Actually, I believe that the techs on this project are doing a very good job.

The limited funding for the research puts them in a position where they can't have what most of us "techies" just assume is normal. They have to do the best they can with what they've got, and that's not a lot.
Mirrored drives for the OS - we see that's not true. Spare disc drives just laying around or online already waiting for a problem - obviously not so. Redundant SAN with no SPOF anywhere and automatic failover to a backup system - at least a year or two worth of storage waiting on-line already? Don't think so.
Maintenance contract with (big database company that will fix any problems in 24 hours provided that you have enough spare backup hardware pre-certified?)

Heh- all that could be fixed with less than 25 million euros - rough guess. Maybe 50. (not counting the service contracts with the vendors)

The tech support at the project are supporting - not only the hardware - but more important and invisible to us volunteers - they are supporting the access to the work we have done - the database - for researchers worldwide.

Understaffed, overworked, with more job demands than anything I ever did as a techie. (Hardware, software, database, application expertise - that would be at least 8 FTEs at even the cheapest shop I ever worked in)

My earlier rant about the ongoing problems with servers should be interpreted as me venting my frustration with the whole situation -

NOT as an accusation of the understaffed and underfunded crew.














ID: 44795 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4309
Credit: 16,355,267
RAC: 6,278
Message 44796 - Posted: 2 Sep 2012, 6:28:38 UTC - in response to Message 44795.  

Totally agree Erik! Two Techies there to do the job. If they had your estimate of eight and they were the same quality as those they have and those eight had the money to buy the hardware they wanted ........ I don't think we would see many of the problems we do....... Or maybe they would just try and do 4 times as much, succeed and still get as many complaints?

Dave
ID: 44796 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 44807 - Posted: 4 Sep 2012, 12:04:46 UTC - in response to Message 44796.  
Last modified: 4 Sep 2012, 12:25:34 UTC

Uploads are working slowly - expect will catch up next 3-4 hours.
Thx Dave - yeah volunteer here a few years the temporary failures of hardware are annoying but no big deal - wait a few days or week at worst and all the work gets uploaded and distributed eventually. Nothing ever lost.
Once happened that a misconfig and load of crap wu's got my goat by wasting my limited bandwidth , that was a while ago.

Main point is - most contributors never notice a week's downtime on the upload server. Last time I looked the "top -- whatever" - computers - they were wasting wu's a mile a minute -

So - thanks - let's keep the osmolality of the effluent minimal when we post here, and keep on crunching -- it's worth doing. Apologize for any flaming I've done.

And - to all - complain, bitch and worry -- if there's ever a problem -- it might be an old moldy problem - but it might be a new problem - and reporting such a problem might very well save all of us volunteers a lot of wasted effort -

So - If you read this board - all complaints are welcome !! :):) - the Mods welcome the chance to help all problems !! :):

Actually, they do help a lot -- thanks

PS - I am not MOD, never will be, but thanks to them all
ID: 44807 · Report as offensive     Reply Quote
Bob

Send message
Joined: 20 Dec 04
Posts: 6
Credit: 4,055,041
RAC: 0
Message 44811 - Posted: 7 Sep 2012, 5:44:03 UTC

7 Sept 2012, 05:36 UTC;

upload disk full error message started to appear 6 Sept 2012 at 22:23 UTC

Server status page indicates server is up and running

Just thought you would like to know.
ID: 44811 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 44812 - Posted: 7 Sep 2012, 8:04:27 UTC - in response to Message 44811.  

Thanks. Confirming what you reported. Same here.
ID: 44812 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4309
Credit: 16,355,267
RAC: 6,278
Message 44813 - Posted: 7 Sep 2012, 8:09:49 UTC - in response to Message 44812.  

I am getting the same on an eu model. saf model which goes to a different server is fine. They should be starting work about now in Oxford so I assume we will see some action this morning.

Dave
ID: 44813 · Report as offensive     Reply Quote
ggrinton

Send message
Joined: 24 Jan 06
Posts: 5
Credit: 435,756
RAC: 0
Message 44814 - Posted: 7 Sep 2012, 10:10:07 UTC - in response to Message 44813.  

Confirming that I am also getting upload failures repeatedly. In itself that does not worry me, but it does chew through my upload quota at a great rate. Is tehre any way to disable the upload for a while? (This is a completed task, and I have other tasks running, so I do not want to just disable network traffic.)
ID: 44814 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 44818 - Posted: 7 Sep 2012, 10:31:58 UTC - in response to Message 44814.  
Last modified: 7 Sep 2012, 10:51:58 UTC

Confirming that I am also getting upload failures repeatedly. In itself that does not worry me, but it does chew through my upload quota at a great rate. Is tehre any way to disable the upload for a while? (This is a completed task, and I have other tasks running, so I do not want to just disable network traffic.)


You could "disable network activity" on one of the tabs in the manager --
BUT -- seems that uploads are working again, so try that option later.

OH gorgonzola and other cheeses -- so overwhelmed with backlog uploads now -- just wait a few hours.
ID: 44818 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4309
Credit: 16,355,267
RAC: 6,278
Message 44819 - Posted: 7 Sep 2012, 13:28:31 UTC - in response to Message 44818.  

Just to confirm that an eu zip file went through at 10:54 on one machine and two more have gone through since so issue seems resolved apart from my curiosity - in the past when the disk has filled up it has taken several hours to transfer the data before the disk has come back on line again. Seems suspiciously quick for it to have really filled up.

Dave
ID: 44819 · Report as offensive     Reply Quote
transient

Send message
Joined: 3 Oct 06
Posts: 43
Credit: 8,017,057
RAC: 0
Message 44820 - Posted: 7 Sep 2012, 15:50:13 UTC

could redirecting the url for the uploadhandler in the hosts file to say 127.0.0.0 be an option?
ID: 44820 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 44832 - Posted: 14 Sep 2012, 14:39:10 UTC

Problems with uploader1 both up and down . Friday of course.
ID: 44832 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,403,322
RAC: 5,085
Message 44834 - Posted: 14 Sep 2012, 21:24:08 UTC - in response to Message 44832.  

Problems with uploader1 both up and down . Friday of course.


I let the project people know, but like you say it's Friday. Hopefully it'll get fixed early next week.
ID: 44834 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4309
Credit: 16,355,267
RAC: 6,278
Message 44837 - Posted: 17 Sep 2012, 13:47:08 UTC - in response to Message 44834.  

My three waiting uploads have all gone, however the server keeps going back to red every so often on the server status page.

Dave.
ID: 44837 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 44838 - Posted: 17 Sep 2012, 14:51:16 UTC - in response to Message 44837.  

Yup - the server goes on and off. Has uploaded a few dozen files from here.
All what I worry about is if the uploads get lost - however many days it takes to get the job done is not a problem. Losing data is the possible problem - but that has never happened as far as I know - long delays happen when server is catching up.
I run 6 machines - right now 3 have network disabled - the other 3 are uploading slowly from time to time. Won't enable network for the other 3 until the online ones clear their queues. Might be a while.

The important thing is not to lose the uploads. Patience is a virtue.


ID: 44838 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Uploads not working

©2024 climateprediction.net