climateprediction.net home page
Frozen WU ???

Frozen WU ???

Message boards : Number crunching : Frozen WU ???
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile John Hunt
Avatar

Send message
Joined: 5 Mar 05
Posts: 64
Credit: 790,577
RAC: 0
Message 25080 - Posted: 11 Nov 2006, 21:02:21 UTC


This WU -
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=1921937
reached 100% completion half-an-hour ago but my hard disk is still working franticly. BOINC manager states WU is still running, CPU time 1462.51.33 , Progress 100% and To Completion = 0.00 .
No other messages appear in BOINC manager.

Any ideas anyone?
(to stop the hard disk activity, I\'ve now suspended the WU)



ID: 25080 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25081 - Posted: 11 Nov 2006, 21:31:39 UTC

Possibly something graphical crashed it. The disk activity is suspicious, as is 100% completed, which usually means that you have an orphan process, and BOINC, losing contact with the program, thinks the model is finished.

While it\'s suspended, turn off Network access, make a backup to a new location, (don\'t overwrite previous backups! ), then re-boot.
Let it run with Network access still suspended, and see if it works now.
If not restore a previous backup.

ID: 25081 · Report as offensive     Reply Quote
Profile John Hunt
Avatar

Send message
Joined: 5 Mar 05
Posts: 64
Credit: 790,577
RAC: 0
Message 25082 - Posted: 11 Nov 2006, 22:06:00 UTC


Thanks for the prompt reply but the WU is still not finalising and uploading.
Continuous hard disk activity still there so I\'ve suspended again.
BOINC Manager now states CPU time 1462.51.33, Progress 0% and To Completion = 1618.48.57.



ID: 25082 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25083 - Posted: 11 Nov 2006, 22:53:30 UTC

Perhaps I was wrong about the state of the model.
If it was almost complete, then the hard disk activity could just be the half hour or so of zipping the data, and preparing the final uploads.
In which case, just leave it for a \"while\". (Hour or 2?)

The figures that you now quote are in line with a model starting from a checkpoint, but not finished re-loading the data so as to start again from there. Which usually only takes a few seconds. (5-6 ?)

Also note that sulphur models aren\'t of much use to the researchers now, but they ARE (quietly) desperate for another 2-3 thousand more TCMs ASAP.

PS
Perhaps your hd is starting to fail, and is trying to get data off a bad area.

ID: 25083 · Report as offensive     Reply Quote
Profile John Hunt
Avatar

Send message
Joined: 5 Mar 05
Posts: 64
Credit: 790,577
RAC: 0
Message 25089 - Posted: 12 Nov 2006, 7:09:16 UTC


Thanks Les - I was probably being a little bit impatient! Anotherr 20 mins of run-time this morning and it finalised perfectly -
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=1221158


ID: 25089 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25091 - Posted: 12 Nov 2006, 7:46:29 UTC

Congratulations.
It\'s nice to see all those graphs on the model\'s page.

ID: 25091 · Report as offensive     Reply Quote
Profile old_user116389
Avatar

Send message
Joined: 25 Nov 05
Posts: 11
Credit: 870,090
RAC: 0
Message 25092 - Posted: 12 Nov 2006, 12:13:39 UTC - in response to Message 25089.  


Thanks Les - I was probably being a little bit impatient! Anotherr 20 mins of run-time this morning and it finalised perfectly -
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=1221158



Well done John!


ID: 25092 · Report as offensive     Reply Quote
Profile John Hunt
Avatar

Send message
Joined: 5 Mar 05
Posts: 64
Credit: 790,577
RAC: 0
Message 25094 - Posted: 12 Nov 2006, 16:18:16 UTC


Now I know what to expect next time I run a CPDN WU.

I\'ll be taking a break from CPDN for a few weeks but I\'ll return refreshed and ready to go on a new WU.

Is it true the new TCM models take twice as long as a Sulphur model? The sulphur model I\'ve just completed was around 7 months in crunching so I would be in danger of going over the deadline if it is still 1 year......



ID: 25094 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25095 - Posted: 12 Nov 2006, 16:32:03 UTC

Yes, about that. Single phase, 160 model years.
But the deadline is just there because something has to be.
3 months on a P4 3.2 GHz machine running 24/7 with no other projects.

The new models are different to slab / sulphur. They upload data every year, (early December), with a bigger upload every 10 years, and a very big restart dump every 40 years.
And they don\'t leave any data on the hd after they finish. Provided they don\'t crash.

ID: 25095 · Report as offensive     Reply Quote
Profile John Hunt
Avatar

Send message
Joined: 5 Mar 05
Posts: 64
Credit: 790,577
RAC: 0
Message 25096 - Posted: 12 Nov 2006, 16:52:45 UTC


Thanks for the info, Les!
(I like this place; somewhere finally where the admins care and respond to the crunchers. A few of the projects, which shall remain nameless, do not show this level of support for crunchers.)

I won\'t be able to run CPDN exclusively 24/7 but I will certainly give it a good proportion of my CPU output in future.




ID: 25096 · Report as offensive     Reply Quote
old_user202664

Send message
Joined: 13 Oct 06
Posts: 60
Credit: 7,893
RAC: 0
Message 25102 - Posted: 13 Nov 2006, 0:59:19 UTC - in response to Message 25083.  
Last modified: 13 Nov 2006, 1:00:25 UTC


Also note that sulphur models aren\'t of much use to the researchers now, but they ARE (quietly) desperate for another 2-3 thousand more TCMs ASAP.


Is there really such a hurry? If there is, I could alter my preferences a bit... I can\'t offer 24/7, but if I let this box crunch 100% of CPDN for a while 12 hours a day or so should be realistic... which seems to be quite okay as my box takes only about 1100 CPU secs per timestep which is reasonably quick compared to what I\'ve seen out there. Shouldn\'t be slower than your example P4 really.
So, what I want to say is ^^ if it is important for the science just tell me and you\'ll at least get one model back a bit quicker ;-)
ID: 25102 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25104 - Posted: 13 Nov 2006, 1:23:54 UTC

It was a slightly tongue-in-cheek hint to the thousands of part time crunchers who are running a dozen or more projects at once, with cpdn getting maybe one hour a day. (I made this up too. )

But the Transient Coupled Models have been available since March, and some people are still working on slab and sulphur models.
With over 60 thousand computers crunching away, it was hoped that results would be further along by now.

Also, in the Seasonal Attribution Project, which closed for the primary purpose at the end of October, there are still over 10,000 models still out there. These will get used by the secondary researchers, but only if they get returned by, perhaps, the end of this year.

So the climate projects aren\'t getting much serious attention from a lot of the people running them.

I\'m just someone interested in the climate projects to the extent that I\'ve been helping on the 3 climate help boards for ages, but not connected in any way with the core team, so I can talk with some experience about problems running the programs, but with no authority whatsoever on what is wanted /hoped for / needed by the project people.

It was recently said by one of the core people that they had been hoping for about 5,000 models by now.

So, just coming from me, yes it would be nice if you could \"move it up a gear\".
And anyone else.
Thank you.

PS
I\'ll soon be back here myself, after 4 months on a spinup model, and then 7 months on SAP models.
Another couple of days to let this last SAP get a good start, and then I\'m also going to start a BBC TCM and a cpdn TCM, to synchronise my cpids. When the SAP finishs in about 10 days or so, it\'ll be 2 TCMs at full speed.


ID: 25104 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 25108 - Posted: 13 Nov 2006, 8:44:32 UTC

...Also, in the Seasonal Attribution Project, which closed for the primary purpose at the end of October, there are still over 10,000 models still out there. These will get used by the secondary researchers, but only if they get returned by, perhaps, the end of this year...


BoincStats shows that only 509 PCs returned a trickle in the last 24 hours, so I\'d guess that probably indicates that a lot of the \'models in progress\' are actually \'lost in action\'. Of course it\'s impossible to say exactly how many since some PCs may have multiple viable models on them.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 25108 · Report as offensive     Reply Quote
old_user202664

Send message
Joined: 13 Oct 06
Posts: 60
Credit: 7,893
RAC: 0
Message 25110 - Posted: 13 Nov 2006, 18:21:50 UTC

Sounds fair enough, Les. I\'ll see what I can do, although I\'m not one of the people with \"one thousand projects\" or so (actually, this PC is shared 50/50 between Einstein and CPDN, with SETI only on my Notebook and HashClash inactive for the time being) but I try to help where it is really needed. And whereas in projects like SETI, Einstein or Rosetta even old P3s or so can be used and show fair performance in the long run, CPDN seems to have high CPU/RAM requirements which probably prevent some interested users from joining. They did for me before I got this box, so now I have the power, why not use it here?
ID: 25110 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25111 - Posted: 13 Nov 2006, 23:10:12 UTC

I\'ve been watching the stats for SAP more closely since the \'main\' closure on October 31, and the number for \"Results in progress\" is dropping quite fast now.
It was only going to be a temporary thing, but I\'ve started writing down the numbers. Unfortunately without dates at the start. (I\'m probably bored. )

They started at 11,331 at about the closure, and are now at 10,441.

And quite a few crunchers around \'my area\' of the credits are starting to \'drop out\'. But a few have started a new model recently, so several people have an extra model or two \'up their sleeve\'.
Another 24 hours, and I think that I\'ll see 4 or 5 more drop out.

***************

Yes the climate projects do have rather a \'hi-tech\' requirement, but one can\'t expect much success with running a supercomputer program on a low end desktop. It\'s always a surprise when people do manage it with low ram, for instance. But all of the simple climate models have been run, and now the researchers are interested in looking \'deeper\' into weather and climate.
So it\'s just as well that people are starting to upgrade to the more modern, more powerfull computers, such as the \'newish\' Core Duo. It looks like these will be needed before long. (They\'re fast.)

ID: 25111 · Report as offensive     Reply Quote
old_user202664

Send message
Joined: 13 Oct 06
Posts: 60
Credit: 7,893
RAC: 0
Message 25113 - Posted: 14 Nov 2006, 0:08:05 UTC
Last modified: 14 Nov 2006, 0:15:35 UTC

Hey, I wasn\'t complaining ;-) rather the opposite... it was meant like \"you need it, I\'ve got it now\"... I know you have good reasons for making the app so \"hightech\". What I wanted to say that at other projects, it\'s easier for other people to contribute, so it\'s not so bad if I do a bit less there for a while. Whereas here, it\'s really limited to those with good computers.
And yes, trying to run climate projects with slower PCs SUCKS. I tried it once on my old laptop... 496 MB of memory (at best -.- shared RAM graphics card) and a Celeron M processor at 1.3 GHz of clock speed... an okay machine for most of the other projects (I\'ve been doing SETI, Einstein and HashClash on it... okay, WUs take longer, but apart from that they all ran fine) so I thought I\'d try BBC climate change ^^ yes, I can be a bit extreme if I find something really interesting, and besides I don\'t have a very high opinion of people saying \"it won\'t run\" because often it will. But this time, it didn\'t. After the third major crash I gave it up... No idea if it was too little memory, if the memory was just too slow (133 MHz, I know it\'s pathetic) or if my CPU played a role as well... only good thing is it didn\'t overheat ^^ no problems there, but after this experiment I really can\'t advise people under the minimum requirements to run these projects.
I\'m glad I\'m back now with something faster, though :-D
ID: 25113 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25114 - Posted: 14 Nov 2006, 0:13:04 UTC

Sorry, Annika, just commiserating, and explaining to anyone else out there who might show up here.

Your offer of extra time, is appreciated.
Thanks.

ID: 25114 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 25117 - Posted: 14 Nov 2006, 8:49:00 UTC

I\'ve been watching the stats for SAP more closely since the \'main\' closure on October 31, and the number for \"Results in progress\" is dropping quite fast now.
It was only going to be a temporary thing, but I\'ve started writing down the numbers. Unfortunately without dates at the start. (I\'m probably bored. )

They started at 11,331 at about the closure, and are now at 10,441.

And quite a few crunchers around \'my area\' of the credits are starting to \'drop out\'. But a few have started a new model recently, so several people have an extra model or two \'up their sleeve\'.
Another 24 hours, and I think that I\'ll see 4 or 5 more drop out.
...


I\'ve been writing down the \'completed models\' figures since around March, and the \'WUs in queue\' figures between August-October :-)

Some people have picked up reissued models where the first generation recently crashed, although I don\'t think there are many of those available (probably a handful per day).

This graph from netsoft online shows the project activity quite well I thought:



Note the very steep drop after work ran out!
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 25117 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25119 - Posted: 15 Nov 2006, 2:49:01 UTC

Tis indeed a nice graph.
Number is now at 10,398 and falling.

ID: 25119 · Report as offensive     Reply Quote

Message boards : Number crunching : Frozen WU ???

©2024 climateprediction.net