climateprediction.net home page
Upload problem

Upload problem

Message boards : Number crunching : Upload problem
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Milo Thurston
Volunteer moderator
Volunteer developer

Send message
Joined: 2 Mar 06
Posts: 253
Credit: 363,646
RAC: 0
Message 37322 - Posted: 23 Jun 2009, 7:57:40 UTC

We have got some data on phkup and a few uploads still go there.
I just logged in and there seems to be a problem with the server not having mounted our home directory; either that or they've somehow deleted all our data. I have contacted the admins for that server and hopefully they will resolve the issue.
ID: 37322 · Report as offensive     Reply Quote
old_user22652

Send message
Joined: 3 Oct 04
Posts: 39
Credit: 13,172,838
RAC: 0
Message 37326 - Posted: 23 Jun 2009, 14:57:37 UTC

"The project no longer uses any of the servers in Berne. This outsourcing ended after the 2005 incident."

Les, this is intriguing. I think maybe I was away doing other projects in 2005 and not reading the CPDN boards, or else I'm
just too old to remember.

If you have the time/inclination to tell us, whatever happened in 2005?

John.
ID: 37326 · Report as offensive     Reply Quote
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 37330 - Posted: 23 Jun 2009, 20:24:58 UTC - in response to Message 37326.  

If you have the time/inclination to tell us, whatever happened in 2005?

There was a serious failure of the server's RAID array closely followed by a major flooding incident in Bern. The server was inaccessible for about 3 weeks and the full story is here.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 37330 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 37333 - Posted: 24 Jun 2009, 5:39:28 UTC

Les Bayliss wrote:
The project no longer uses any of the servers in Berne. This outsourcing ended after the 2005 incident.


Back in 2005, there was talk about replacing the server phkup in Switzerland with one in England to make physical access to it easier.
Apparently this didn't happen after all, and that server is still physically in Switzerland.

ID: 37333 · Report as offensive     Reply Quote
Profile Milo Thurston
Volunteer moderator
Volunteer developer

Send message
Joined: 2 Mar 06
Posts: 253
Credit: 363,646
RAC: 0
Message 37334 - Posted: 24 Jun 2009, 8:20:19 UTC - in response to Message 37333.  


Apparently this didn't happen after all, and that server is still physically in Switzerland.


It is indeed.
I've just heard that it's up again and I'm reconfiguring it.
Interestingly, the link to the 2005 problems refers to both power supplies on uploader1.atm failing. This is the same machine that failed recently, only more severely. I think it's about time to retire that hardware and any future machine appearing on that url will be new.
ID: 37334 · Report as offensive     Reply Quote
Profile Skip Da Shu
Avatar

Send message
Joined: 31 Aug 04
Posts: 42
Credit: 15,303,437
RAC: 130,044
Message 37342 - Posted: 25 Jun 2009, 6:17:56 UTC - in response to Message 37334.  


Apparently this didn't happen after all, and that server is still physically in Switzerland.


It is indeed.
I've just heard that it's up again and I'm reconfiguring it.
Interestingly, the link to the 2005 problems refers to both power supplies on uploader1.atm failing. This is the same machine that failed recently, only more severely. I think it's about time to retire that hardware and any future machine appearing on that url will be new.


Reading that historical account of the 'Bern Floods' I thought I picked up that ya'll are using Dell stuff. Living about 10 miles from Dell you'd be surprised how many Dell server parts are laying around within a 50m radius of here. Next time you need a PSU post up the model... we might be able to locate one in a matter of hours. I'd call Steve first... about 3 miles up the hwy from me... unless he's 'cleaned up' he has an entire room in his house full of Dell stuff... mostly server stuff... I don't ask. LOL.

- da shu @ HeliOS,
"Free software is a matter of liberty, not price. To understand the concept, you should think of free as in free speech, not as in free beer"
ID: 37342 · Report as offensive     Reply Quote
Profile old_user553658

Send message
Joined: 17 Jan 09
Posts: 2
Credit: 43,535
RAC: 0
Message 37461 - Posted: 13 Jul 2009, 7:04:03 UTC
Last modified: 13 Jul 2009, 7:05:50 UTC

Got a failure-to-upload problem, which seems to be de rigueur in order to crunch on this project... :p

Here's the feedback:

13-Jul-09 2:31:48 AM climateprediction.net Started upload of hadam3p_n0q3_1995_2_006079789_2_2.zip
13-Jul-09 2:31:50 AM climateprediction.net [error] Error reported by file upload server: Server is out of disk space
13-Jul-09 2:31:50 AM climateprediction.net Temporarily failed upload of hadam3p_n0q3_1995_2_006079789_2_2.zip: transient upload error
13-Jul-09 2:31:50 AM climateprediction.net Backing off 27 min 9 sec on upload of hadam3p_n0q3_1995_2_006079789_2_2.zip
(and later)13-Jul-09 2:59:00 AM climateprediction.net Backing off 3 hr 10 min 33 sec on upload of hadam3p_n0q3_1995_2_006079789_2_2.zip

This has happened several times... there appeared to be 2 accompanying files, hadam3p_n0q3_1995_2_006079789_2_1.zip and hadam3p_n0q3_1995_2_006079789_2_3.zip, and they uploaded after a couple of aborts. However, it looks like the big file doesn't want to or can't be uploaded.

What is a transient upload error, btw? Transient (<L transiens, -ntis transiting, temporarily visiting, going across, pres. part. of transire, to move across) implies temporary in nature. I do hope so :) Also, I read that there is a logjam of WU's waiting to be processed and such; how long should I expect to wait? N.B. the message about your server being out of space... what's up? Did someone go on leave and let everything kind of pile up?

Last, am I in any danger of losing this WU, of it just giving up like another guy's WU did? If so, please let me know any steps I might take to avoid this. And my CPU put in 228h 2m 41s on this baby... I do hope my credit isn't in jeopardy, as I am competitive and would like to see the points :) Thanks... to a better knowledge of our climate... :)
The pretty lady you see around my profile is Hayley Westenra, an angelic singer from New Zealand

ID: 37461 · Report as offensive     Reply Quote
Profile PinkPenguin
Avatar

Send message
Joined: 26 Apr 09
Posts: 6
Credit: 514,253
RAC: 0
Message 37462 - Posted: 13 Jul 2009, 7:20:33 UTC

Same error here since last night:

13/07/2009 09:00:08 Backing off 1 hr 5 min 5 sec on upload of hadam3p_mal2_1987_2_006115552_1_2.zip
13/07/2009 09:00:08 Temporarily failed upload of hadam3p_mal2_1987_2_006115552_1_2.zip: transient upload error
13/07/2009 09:00:08 [error] Error reported by file upload server: Server is out of disk space

According to Server Status all lights are green.

Is there a Disk space problem with the upload server (should be this one according to client_state.xml: cpdn-upload1.comlab.ox.ac.uk) ?
ID: 37462 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 37463 - Posted: 13 Jul 2009, 8:19:36 UTC

As was posted in the News on the 18th of June:
I've now got a temporary server running as uploader1.atm.ox.ac.uk. It's not very good hardware and has limited space, so there may be delays in uploading


With around 3 terabytes of science data being stored, things do slow down occasionally.
The project people will be back at work in a couple of hours.

*******************

Cesium_133
The zips can go to different servers. You need to check in client_state.xml to see the relevant upload server.

Transient: passing / momentary.

logjam of WU's

Check the date of those posts; that problem was 3 weeks ago.

Did someone go on leave and let everything kind of pile up?

It's just a quaint old custom that they have in England, called The Week End.


Credits are allocated per trickle, all the way through the creation of a model. And you don't lose them, even for models that fail.

Last, am I in any danger of losing this WU, of it just giving up like another guy's WU did?

That too, was 3 weeks ago. And there is 14 days for BOINC to keep retrying.
So, as long as you don't do anything silly, such as aborting transfers or models, it will all get there in the end.

The README files linked from my sig are full of useful hints and tips.

ID: 37463 · Report as offensive     Reply Quote
Profile Milo Thurston
Volunteer moderator
Volunteer developer

Send message
Joined: 2 Mar 06
Posts: 253
Credit: 363,646
RAC: 0
Message 37465 - Posted: 13 Jul 2009, 9:17:56 UTC

Sorry about this, it's more HadAM3P data coming in rapidly whilst every server I try to get is either delayed (the new one is 60% built), off-line or otherwise unavailable. I will do what I can to move data somewhere as soon as possible.
ID: 37465 · Report as offensive     Reply Quote
Profile PinkPenguin
Avatar

Send message
Joined: 26 Apr 09
Posts: 6
Credit: 514,253
RAC: 0
Message 37467 - Posted: 13 Jul 2009, 10:36:55 UTC - in response to Message 37463.  
Last modified: 13 Jul 2009, 10:39:37 UTC

As was posted in the News on the 18th of June:
I've now got a temporary server running as uploader1.atm.ox.ac.uk. It's not very good hardware and has limited space, so there may be delays in uploading

My apologies - I thought the problem had been resolved in the end.

It's just a quaint old custom that they have in England, called The Week End.


... Hey, Moriarty, they still have the "Week End" in good ol' Blighty!
All the best....
ID: 37467 · Report as offensive     Reply Quote
Profile old_user553658

Send message
Joined: 17 Jan 09
Posts: 2
Credit: 43,535
RAC: 0
Message 37475 - Posted: 14 Jul 2009, 6:58:36 UTC - in response to Message 37463.  

With around 3 terabytes of science data being stored, things do slow down occasionally.


So it appears. Things are still in limbo... the last message complement reads:

14-Jul-09 2:32:14 AM climateprediction.net Started upload of hadam3p_n0q3_1995_2_006079789_2_2.zip
14-Jul-09 2:32:16 AM Project communication failed: attempting access to reference site
14-Jul-09 2:32:16 AM climateprediction.net Temporarily failed upload of hadam3p_n0q3_1995_2_006079789_2_2.zip: connect() failed
14-Jul-09 2:32:16 AM climateprediction.net Backing off 3 hr 7 min 25 sec on upload of hadam3p_n0q3_1995_2_006079789_2_2.zip
14-Jul-09 2:32:17 AM Internet access OK - project servers may be temporarily down.

Also, you say you have space for 3 Tb of data. I am going to pretend I know what I'm talking about :p ... but that equates to 14 of my computers' worth of memory and space (I only have 1, lol). For a project that's the largest climatological research job in the world using DC technology, that sounds like a real paucity of space. Maybe I should contribute £ to the effort so you all can get a spare PC or something for events such as this :) Seriously... is that the answer, more space?

Credits are allocated per trickle, all the way through the creation of a model. And you don't lose them, even for models that fail.


I looked up trickles, and I get how they work... kudos to the BOINC Wiki. Does that mean I already have the credit, that it's reflected in the rankings I look up for myself? Or is that credit latent, credited tentatively but somehow undisplayed pending a confirmation? If I have it already, do I need to worry if the data -ever- gets uploaded?...

...(from above rhetorical question) FYI, yes, I care very much about it being uploaded. I'm crunching for credit, sure, but I do so for our common knowledge as we try to save this planet we've already f----- up enough. I've read the business about the 14-day deadline thing, and how to extend it if need be. I don't care if I see the points already posted to my name... I'm here to help you all get that WU of mine, along with the 2 others I'm now doing, to you in good order. I'm not going to let that data go down the loo-throne.

If I need help finding or working with those files to extend time, I will come a-calling. Our Earth and my personal contributions toward understanding it and predicting future climactic change mean enough to me to make a serious pest of myself getting help with what I can't figure out by RTFM'ing.

As for weekends here, they do exist. I actually don't obligate myself to work much on Saturdays or at all on Sundays, though I generally do. And my computer knoweth not a Sabbath...
ID: 37475 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 37479 - Posted: 14 Jul 2009, 9:39:13 UTC

The 3 terabytes is the approximate amount of returned model data stored. Actual server space is more.
The new server that Milo mentioned a few posts back is the one that will replace the temporary server cpdn-upload1.comlab. The new one will be a 20 terabyte machine.
Apparently this temporary machine has had 1.4 terabytes of data uploaded to it since it was used to replace the original. (Which suffered a power supply failure, followed by a raid HD failure.)

The City of Oxford has many buildings associated with various Colleges of the University of Oxford, as well as other buildings housing offices of people such as this project's people. Scattered all through this are many computer server rooms, and some of these house servers used by this project. Most of these rooms are not accessible to the project people all of the time, so when something goes wrong there, it's necessary to wait until an IT person from that area becomes available. This is currently the case with one of the machines used for temporary storage.

*****************

As often posted on these boards, credit is re-created daily by a program that runs once per day, just before midnight UK time.
If you upload a trickle just after this program runs, it will be 24 hours before you see that credit.
And Pending credit is a BOINC mechanism that isn't used here.
Nor is Validation.

*****************

Also note that cpdn-upload1.comlab is currently disabled due to ongoing problems, as per the News thread.

ID: 37479 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 37480 - Posted: 14 Jul 2009, 22:55:18 UTC
Last modified: 14 Jul 2009, 23:06:00 UTC

Hi Cesium 133

The majority or, in the case of some model types, all the data required by the scientists is contained in the zip file uploads, not in the trickles. The trickles tell the server that the model's still crunching and needs more credit for the extra progress made. But of course completed models that upload all their zip files are far more valuable for the research than unfinished ones.

You will already have the credit for all or most of the trickles you've uploaded. Your most recent credits may not appear in your account yet because the various credit scripts don't run continuously. Each day our credit total is 'exported' to several stats sites like BoincStats where you can search for yourself and other members of CPDN and other BOINC projects. The stats sites are invaluable for providing all sorts of comparative data not directly available on project web pages.

At the moment nobody needs to be even thinking about extending the two-week period BOINC allows files to remain in the Transfers tab after the first failed upload. The current situation is a nuisance but not nearly as serious as the server crisis in June.
Cpdn news
ID: 37480 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 37481 - Posted: 15 Jul 2009, 2:05:44 UTC - in response to Message 37475.  

I feel your pain, Cesium. I'll have 11 or 12 zip files to upload by tomorrow plus dozens of trickles. Thankfully, I suspended all BOINC activity before the models have finished. I got 4 models scheduled to finish by tomorrow. Bad timing, I guess.
ID: 37481 · Report as offensive     Reply Quote
Profile Ananas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 37483 - Posted: 15 Jul 2009, 20:21:36 UTC - in response to Message 37481.  
Last modified: 15 Jul 2009, 20:27:16 UTC

I feel your pain, Cesium. I'll have 11 or 12 zip files to upload by tomorrow plus dozens of trickles. Thankfully, I suspended all BOINC activity before the models have finished. I got 4 models scheduled to finish by tomorrow. Bad timing, I guess.



The trickles should upload as soon as you re-enable network activity.

Parts of the .zip files, that go to different servers, will upload too.

From what I can see, only the *_2.zip files go to the stuffed (and disabled) server, BOINC will retry those upload for several days, so hopefully they will find their way either to the new server or to the freed-up space on the current temporary server.

So hopefully nothing will get lost (having 20 models waiting or about to be finished myself, I sure hope that this will go well), just do not abort anything and leave the upload "retry" button alone unless the server status page shows green, as (to my knowledge) at least some BOINC versions have a limited number of upload attempts.


p.s.: The simple trickle reports during a phase are included in a scheduler contact, they are not really uploads, they are just progress reports in XML format
ID: 37483 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,060,840
RAC: 733
Message 37484 - Posted: 16 Jul 2009, 7:37:08 UTC

Hi, Ananas

You are right about the trickles returned by most of the model types being nothing but simple progress reports, but, the trickles in the CM models contain real data. Each trickle contains the results for the model year just finished. There is also a sort of super-trickle every 10 years that contains all the trickles for the previous 10 years and a similar mega-trickle every 40 model years.

ID: 37484 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 37488 - Posted: 17 Jul 2009, 4:10:21 UTC - in response to Message 37483.  

So hopefully nothing will get lost (having 20 models waiting or about to be finished myself, I sure hope that this will go well), just do not abort anything and leave the upload "retry" button alone unless the server status page shows green, as (to my knowledge) at least some BOINC versions have a limited number of upload attempts.


All 12 uploads were successful tonight. :)
ID: 37488 · Report as offensive     Reply Quote
Profile old_user172201
Avatar

Send message
Joined: 7 Mar 06
Posts: 5
Credit: 4,085,123
RAC: 0
Message 37648 - Posted: 6 Aug 2009, 20:41:37 UTC

Upload server uploader.oerc down again! Planned or not? Hopefully the first but I fear the worst.
ID: 37648 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 943
Credit: 34,182,167
RAC: 6,569
Message 37649 - Posted: 6 Aug 2009, 20:56:15 UTC - in response to Message 37648.  
Last modified: 6 Aug 2009, 20:57:40 UTC

Upload server uploader.oerc down again! Planned or not? Hopefully the first but I fear the worst.
Congratulations!

Only 29 hours after the matching news thread announcement was made. That must be some sort of record.
ID: 37649 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Upload problem

©2024 climateprediction.net