climateprediction.net (CPDN) home page
Thread 'Q. concerning duration'

Thread 'Q. concerning duration'

Questions and Answers : Getting started : Q. concerning duration
Message board moderation

To post messages, you must log in.

AuthorMessage
ScienceBlog.at

Send message
Joined: 13 Feb 14
Posts: 6
Credit: 646,238
RAC: 0
Message 49146 - Posted: 18 May 2014, 7:42:38 UTC

Hello, world!

We're running BOINC 7.0.27 on a debian 6.0.9 box with 2 Xeon E5450 CPUs (which gives a total of 8 cores @3GHz each).

We participate in HadAM3PEU (as this is, where we are) and HadAM3PM2 (as it is �Unix-only�, and there don't seem to be too many Unix-boxes out there).

The time estimate for a EU-task is roughly 120hrs when it starts, and after 22 hrs 16% of the job are done, which works out to some 140hrs � just like the estimate said. But: the Global Moses II tasks start at 150hrs and after 18:30hrs only 0.300% are done, which yields a total runtime of something like 6000hrs.

The computer running the models is a 24x7 server, so this doesn't pose a problem by itself, but my question is if it makes sense to run such long tasks on a CPU? Or would you rather aborted those long-runners and leave them to be tackled by GPUs?

Thanks for your opinions,

Cheers from Vienna/Austria
Matthew
ID: 49146 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 49148 - Posted: 18 May 2014, 7:56:38 UTC - in response to Message 49146.  

Hello Matthew

As posted in a thread in the Number crunching section, there are severe problems with the MOSES II models, to the extant that they're been withdrawn for the time being.

One of the problems is that they don't create output files for a whole year if stopped, although they continue on until the end before saying that they've failed.
Another is the time-to-completion estimate, which was inadvertently set for a 100 year model, instead of 10 years. So the estimate is out by 10 times.

Those models still in circulation will remain there until their re-send limit is reached. :(
If you don't want to end up with one of them, just unselect that model type in your project preferences until they're re-issued.

cpdn doesn't use GPUs because the models are from the UK Met Office, who use supercomputers for their modelling, not GPUs.

ID: 49148 · Report as offensive     Reply Quote
ScienceBlog.at

Send message
Joined: 13 Feb 14
Posts: 6
Credit: 646,238
RAC: 0
Message 49150 - Posted: 18 May 2014, 10:53:08 UTC - in response to Message 49148.  

Thanks, Les, this explains it, of course! But, then, I've got another question: will a result be of use to anyone or are they ignored anyway? In the former case I'd let those PM2 tasks have their chance of getting completed, in the latter I'd � just as you suggest � abort them and deselect them in the preferences.

M.
ID: 49150 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 49153 - Posted: 18 May 2014, 15:31:04 UTC

Hi Matthew

They will be useful to the scientists but only if they produce all their files. If a single file is missing the whole model will be discarded.

They will only produce all their files:

* if they are never stopped or interrupted (a stop, for example an exit from Boinc Manager or a computer shutdown, means the file that should be produced next will be missed).

* or if you have selected Yes for Leave Tasks in Memory while Suspended? in the computing preferences of your account. If this option is selected the user is free to stop the model.

You could leave network activity off to see whether the files are being produced. They'd stay waiting in the Transfers tab until you let them upload.

Some people have successfully completed these models, generating all the files. But you'd be one of a select minority.
Cpdn news
ID: 49153 · Report as offensive     Reply Quote
ScienceBlog.at

Send message
Joined: 13 Feb 14
Posts: 6
Credit: 646,238
RAC: 0
Message 49154 - Posted: 18 May 2014, 15:42:23 UTC - in response to Message 49153.  

Hi, mo.v,

you just bought them the chance to get a result: as this host has quite some RAM, the �leave in memory� option is actually activated (and has been all the time). So I'll let things run and see what happens.

Thanks for the explanation!

Cheers
Matthew
ID: 49154 · Report as offensive     Reply Quote
Andrew Sanchez
Avatar

Send message
Joined: 28 May 14
Posts: 34
Credit: 705,936
RAC: 0
Message 49273 - Posted: 30 May 2014, 18:41:45 UTC - in response to Message 49153.  
Last modified: 30 May 2014, 18:50:19 UTC


They will only produce all their files:

* if they are never stopped or interrupted (a stop, for example an exit from Boinc Manager or a computer shutdown, means the file that should be produced next will be missed).

* or if you have selected Yes for Leave Tasks in Memory while Suspended? in the computing preferences of your account. If this option is selected the user is free to stop the model.



Whoa wait a sec. I'm new to this project so i need some instructions. After reading this thread i checked my computer preferences and saw that "Leave tasks in memory while suspended?" was set to "no". Apparently this is the default setting because this was the 1st time I've even looked at the preferences; i just assumed the default settings were the best settings to complete a task so i didn't mess with them. So are you saying that i should change this setting to "yes"? And if so, is it going to affect (in a bad way) the task i'm currently running? If these answers depend on my other preferences or other projects/tasks i'm running just ask and i'll give you any info needed to answer. Also, while i have your attention, are there any other settings i should check/change to make sure i have the best chance of producing usable data?
Thank you,
Andy
P.S.
I'm changing it to "yes" right now, i hope i'm not screwing anything up. * fingers crossed *
ID: 49273 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 49274 - Posted: 30 May 2014, 19:02:07 UTC - in response to Message 49273.  

You are doing the right thing, it won't harm any other projects either. You would have to find someone more expert in BOINC to explain what the advantages of having it set to, "no" are. My guess would be more memory free for anything else that needs a lot of it but I am only guessing. What I can testify is that setting it to, "yes" means a massive reduction in crashed tasks especially on machines with less than 2GB/core used for crunching and even with plenty of memory it still greatly reduces the risks. This is especially true of the hadamc3n full resolution ocean models. Some of the others, e.g. the ANZ models I am currently crunching have survived a couple of power outages with no adverse effects which quite surprised me.
ID: 49274 · Report as offensive     Reply Quote
Andrew Sanchez
Avatar

Send message
Joined: 28 May 14
Posts: 34
Credit: 705,936
RAC: 0
Message 49275 - Posted: 30 May 2014, 19:19:43 UTC - in response to Message 49274.  

Dave to the rescue, AGAIN! Thanks for another fast reply :)

Andy
ID: 49275 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,853,498
RAC: 4,726
Message 49276 - Posted: 30 May 2014, 21:51:37 UTC

The default BOINC settings are intended to keep BOINC in the background as much as possible, as Dave suggests - i.e. interrupt the normal operation of the computer as little as possible. BOINC is meant to use your spare computing power, not the computing power you actually need.

The effect of the default settings then depends on how the computer is used. If the person using the computer does lots of small things - check an e-mail, browse the Web etc. - then the BOINC task will be constantly interrupted. BOINC tasks regularly save their state to disk so that they can restart from a safe point when necessary. However, for some models "regular" does not mean "frequent", so if the computer is used for its proper purpose more frequently than the state of the model is saved then the model will keep re-starting from the last save point and will make slow progress. If the saved data has an error then the task may crash. For other users, who leave the computer unattended for long periods, the default settings may be entirely compatible with making good progress on models.

Selecting the "keep in memory" option means that the BOINC task does not have to constantly restart from the last point saved to disk - it will just carry on from where it was when it was suspended. That is likely to be more reliable and faster.
ID: 49276 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 49277 - Posted: 31 May 2014, 0:58:46 UTC

You also might want to check the setting for �Suspend if CPU usage is above� and see if it is set to anything other than �0�. If it is set to default, which I believe is 30%, it will cause the model to stop and restart continuously as you use the computer. This is supposed to make BOINC a good citizen and stop it from slowing down other tasks.

Resetting to �0� will speed up the crunching by allowing the model to run all the time, but, may slow down other tasks on your computer.

ID: 49277 · Report as offensive     Reply Quote
Andrew Sanchez
Avatar

Send message
Joined: 28 May 14
Posts: 34
Credit: 705,936
RAC: 0
Message 49279 - Posted: 1 Jun 2014, 2:45:04 UTC
Last modified: 1 Jun 2014, 2:45:34 UTC

Iain, thanks for the explanation:)

Jim, i'll look into the settings you mentioned, see how they're set up, and consider what would be best for my needs. Thank you :)
ID: 49279 · Report as offensive     Reply Quote

Questions and Answers : Getting started : Q. concerning duration

©2024 cpdn.org