climateprediction.net home page
Posts by Ingleside

Posts by Ingleside

21) Questions and Answers : Wish list : Enhance scheduling/throttling strategies (Message 48958)
Posted 29 Apr 2014 by Ingleside
Post:
If you edits the BOINC-preferences locally on a computer, it's saved in a file called global_prefs_override.xml located in the BOINC data-directory.
In addition BOINC includes boinccmd located in the BOINC application-directory, this is a command-line tool to give various commands to a running BOINC-client, including reading global_prefs_override.xml

So while BOINC doesn't directly support changing %cpu-usage due to time-of-day, one method that should work is to make two small batch-files, and these batch-files you can schedule to run at a particular time in Windows.

To make the batch-files, you'll first need to make the BOINC-preferences. This can example be to make the "full"-preferences, copy global_prefs_override.xml and calling it full.xml, re-edit BOINC-preferences choosing the "low"-preference and copy global_prefs_override.xml calling it low.xml

The full-batch-file can be something like:
copy "your-boinc-data-dir\full.xml" "your-boinc-data-dir\global_prefs_override.xml"
"your-boinc-app-dir\boinccmd" --read_global_prefs_override

And the half-batch-file can be something like:
copy "your-boinc-data-dir\low.xml" "your-boinc-data-dir\global_prefs_override.xml"
"your-boinc-app-dir\boinccmd" --read_global_prefs_override

You'll need access-rights for the copying to work.
22) Message boards : Number crunching : CONVERTING TO LINUX (Message 48944)
Posted 28 Apr 2014 by Ingleside
Post:
That's a thought, but with dual boot, Jim can just switch to Windows when cpdn is out of work and run other projects with that.

Well, atleast in my experience, a project going down always happens at the wrong time, example 15 minutes before you're increasing the cache-size from 0.1 days to 5 days, or 5 minutes after leaving in the morning so computer can sit idle for many hours during the day if don't have a backup-project.

Even with "lots of work" cached for CPDN, how many bad batches has CPDN released over the years, crashing after a few seconds in a run? Not to forget, while all Hadam3p-variants for years was probably the most stable models, after PNW released a new version (AFAIK not even bothered beta-tested) all PNW-models has in my experience had a 100% error-rate. (and as a bonus these crashes has filled-up the hd leading to also the other models crapping-out due to no free disk space).

23) Message boards : Number crunching : CONVERTING TO LINUX (Message 48931)
Posted 28 Apr 2014 by Ingleside
Post:
I agree with Les: Mint is best if your Linux install is going to be used for various things. But if your box is dedicated to CPDN work (or BOINC work in general), I'd recommend 32 bit Lubuntu. Lubuntu is Ubuntu with the 'LXDE' desktop, which is very light on system resources, so it leaves more CPU for CPDN to use.

If you're only going to run CPDN under Linux using a 32-bit version is possibly the best option. But if you're also expecting other projects will be run, either due to your choise or as a backup for next time CPDN is out of work or has server-problems and can't get any new work, a 32-bit Linux doesn't look like a good choise, since for some BOINC-projects the 64-bit-applications has a significant speed-advantage.


24) Questions and Answers : Wish list : Using GPUs for number crunching (Message 48739)
Posted 8 Apr 2014 by Ingleside
Post:
Assuming no major problems with the compiler the next step is professional development of your programming staff with training on GPUs. At this point you might be in a position to know if GPU processing is a reasonable option.

Well, AFAIK all currently active climate-models uses SSE2-optimizations, and my guess this means they're using double-precision. Since the fortran-compiler linked a few posts back is CUDA, and Nvidia-cards has abyssimally poor double-precision-speed of only 1/24 single-precision-performance, except if you pays $$$$ for the professional cards, even a top-end Nvidia-GTX780Ti only manages 210 GFLOPS at most. A quad-core (8 with HT) cpu on the other hand is around 100 GFLOPS. Meaning even best-case the Nvidia-GPU will only be 2x faster than CPU. In reality even 50% performance on GPU can be too high, meaning your "slow" CPU is outperforming your "fast" GPU.

So, unless can use single-precision on most of the calculations, a CUDA-version of CPDN is a waste of development-time.

Instead of CUDA, an OpenCL-compiler would be more interesting, since OpenCL also works with the much faster Amd-GPU's. But even with this additional speed, it's still unlikely can get a climate-model to run faster on GPU than CPU.
25) Message boards : Number crunching : No Tasks Available (Message 48299)
Posted 5 Mar 2014 by Ingleside
Post:
There was a nasty bug in v7.2.39 which could be causing the download errors.

Haven't been keeping-up with client-changes recently so wasn't aware of this.

v7.2.42 is a bug-fix - maybe the ones you're seeing with download errors and v7.2.42 have upgraded the client in the meantime? Can you check whether the individual failed tasks were attempted under v7.2.39, whatever client they're running now?

Did only find one of my wu's having someone reporting as v7.2.42 with download-error, it's host 1289490 and a quick look reveals 7 errors at the same time. Interestingly enough they're reported as "error" and not "download error". Also, it's only 6 minutes between being assigned and reported as error, so clearly someone manually hitting "update". While it's possible they did swap BOINC-client during these 6 minutes before reporting the errors, this info isn't available anywhere so...
26) Message boards : Number crunching : No Tasks Available (Message 48285)
Posted 5 Mar 2014 by Ingleside
Post:
The permanent http error is only happening to a few people, so is most likely a problem with their computer.

Well, taking a look on the wu's I've downloaded, while I've not had any download-errors myself the current results are:
90 wu's downloaded, of these:
38 error-free (atleast for now).
39 wu's with download-errors.
21 wu's with computing-errors.
48 total download-errors.
27 total computing-errors.
3 wu's errored-out due to too many errors.

43% of the wu's having download-errors is in my opinion too high, so even if only a "few" users has problems they're managing to generate lots of errors. Since atleast some of these users seems to have no problems crunching other BOINC-projects, it's a little strange if where's a problem with their computers.

Now I've not checked every download-error, but atleast the checked on was from users running BOINC-version 7.2.39 or 7.2.42. If this indicates either a problem with current BOINC-clients or CPDN's server-setup I've no idea about, it can also just be all errors didn't check is from different BOINC-versions.

BTW, appart for all the download-errors, 23% of wu's generating atleast one computing-errors seems on the high side to me.
27) Message boards : climateprediction.net Science : New project launch tomorrow: Weather@home 2014: the causes of the UK winter floods (Message 48270)
Posted 4 Mar 2014 by Ingleside
Post:
Welcome to the forums. :)

edit - seems the homepage now has been fixed.
28) Message boards : Number crunching : No Tasks Available (Message 48269)
Posted 4 Mar 2014 by Ingleside
Post:
Not aware of any download-errors, but had 4 models crashing-out with the following message:
<stderr_txt>

Model crashed: INITTIME: Atmosphere basis time mismatch                                                                                                                                                                                                                        tmp/xaakm.pipe_dummy                                                            2048    
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>

The wu's are 8683247, 8683249, 8683250 and 8683251.

On the same computer some of the other models had already been running for a few hours, and another model started successfully a few seconds after the 4 crashing ones. No idea if any other problems, since no way to know how many of the models has started crunching (no access from here).
29) Message boards : Number crunching : Microsoft Visual C++ Runtime Error (Message 48267)
Posted 4 Mar 2014 by Ingleside
Post:
It's a problem that's been cropping up for a couple years, and no definitive answer or cure has been found in all of that time.
All that can be done is to offer sympathies, and hopes that the next one will work OK.

Atleast my experience (under windows) is if you installs BOINC as a service it means the popup-message can't be shown, so the model will just silently crash-out on it's own.

The disadvantage is you can't install BOINC as a service if you're also doing GPU-crunching.

So for anyone not doing GPU-crunching, installing as a service won't fix some CPDN-models crapping-out, but it should atleast happen without the spamming popup-messages.



30) Message boards : Number crunching : VANISHING WU'S (Message 48061)
Posted 27 Jan 2014 by Ingleside
Post:
Every night at midnight your time you'll still have your work quota put back to one model per core per day.

Uhm, in older BOINC server-code it was midnight server-time, not user-time, so for CPDN this would equal midnight GMT in the winter.

Since having all quota-limited computers connecting the hour after midnight server-time gave an extra spike in server-load, in more resent server-code the "midnight" is instead randomly assigned to individual computers, meaning someone with multiple computers can have one computer getting a new quota at 01:23:45, another at 12:33:44, a third at 05:43:21 and so on. I'm not sure if CPDN has resent-enough code to have this functionality or the older midnight-server-time-code...



31) Message boards : Number crunching : Reporting - Errors while computing - (Message 47632)
Posted 22 Nov 2013 by Ingleside
Post:
"Computing allowed"
1] while computer is in use
2] while processor usage is less than 0 percent

I'll change "Only after computer has been idle for" to 0 minutes, it was on 3.00mins.
{not entirely sure what this latter setting actually means or really remember why it was on 3.00}

You're not allowed to set "has been idle for" to zero minutes, even as has already been mentioned this setting isn't used if you don't suspend computing for any reason. While it's possible to manually edit the preference-file (either override or general) and set it to zero, if you do this the client-default is used instead, and this probably is 3 minutes.

Some other settings on the other hand does accept zero minutes, and also a little inconsistently zero percent as far as processor-usage means 100%.
32) Questions and Answers : Windows : C++ error continually occurs (Message 47581)
Posted 14 Nov 2013 by Ingleside
Post:
So was there any definitive fix for this problem? I've got it too.

In my experience running BOINC as a service will "fix" the problem, atleast as far as a dialogue popping-up and a cpu-core sits idle until you click on the message goes. Models can still crash with C++ error, and it's also possible a crash will just leave model running even after hitting 100%, but atleast you'll not get the popup-message any longer.

If you're also using your GPU for crunching on other projects, service-installation will unfortunately not be an option.
33) Questions and Answers : Windows : Windows 8.1, a caution... (Message 47409)
Posted 26 Oct 2013 by Ingleside
Post:
I found it impossible to install boinc in its own partition, as I've done since CPDN merged with boinc. Installation was also impossible as a 'service' unless all three options were accepted, including 'screensaver.'

Atleast this part of your problems is you're insisting on trying to run an ancient BOINC-client what never has been and never will be supported by Windows 8.x.

Windows 8.0 and MacOS "Mountain Lion" or later OS-Versions demands BOINC v7.0.xx or later BOINC-versions to work correctly.
34) Message boards : Number crunching : failed upload: can't resolve hostname (Message 47300)
Posted 12 Oct 2013 by Ingleside
Post:
It WAS tested and the results posted in that extinct thread.
I think that the person in question unpacked the zip/tar them self on arrival and before it had a chance to start, and manually checked the url.

Hmm, why someone would zip or tar (and feather) their sched_reply_climateprediction.net.xml escapes me, and if it was done to any of the many CPDN-files residing in the project-directory makes even lesser sence since the BOINC-client doesn't know (and doesn't care) if any of these files somehow does include an url.

But while my recollection was too fuzzy, it's an advantage I did take part in atleast one of the discussions myself and this was not done on the php-board.

This message from 12.04.2011 is the most interesting, clearly showing the client_state.xml was corrupt even before any of the files was downloaded while sched_reply* was not corrupt.

If the problem was CPDN-only on the other hand was never answered by the tester in the old thread...
35) Message boards : Number crunching : failed upload: can't resolve hostname (Message 47298)
Posted 12 Oct 2013 by Ingleside
Post:
cwhyl

This was discussed extensively on the old php board 2-3 years back when it first started happening. It was also tested a fair bit.

The files were/are OK on the server.
They're OK when they arrive zipped up on the user's computer.
At some point after unzipping and moving to their various locations, the data in the client_state.xml file shows up corrupted, in a couple of different ways.

So it's most likely a subtle bug in BOINC for a particular variety of Linux.

Uhm, maybe my recollection is too fuzzy, but I don't remember anyone with a corrupt upload-URL ever showing they did get a sched_reply_climateprediction.net.xml with the correct upload-URL and this was either wrongly inserted into client_state.xml or client_state.xml later getting corrupted.

Since CPDN doesn't try uploading before having trickled N times, sched_reply* has also been wiped-clean atleast N times. This is one of the reasons trying to pin-point why some is getting corrupt URL is so hard, and also why AFAIK server-problems as the source never was eliminiated.

36) Message boards : Number crunching : Compute Errors / Bad Work Units? (Message 47218)
Posted 30 Sep 2013 by Ingleside
Post:
Make sure you have "leave applications in memory when suspended" OFF.

For the majority of crunchers it's always better to have "leave applications in memory" ON, and for some BOINC-projects it's a good chance you'll have problems if it's not turned on.

For CPDN, especially if you're starting many models at once, there'll be large disk-trashing and chances are this increases the probability of something using "too long" and errors-out the model. As long as models is kept in memory, you'll not have this problem except after rebooting computer. So, if you're not really short on memory, or runs some really memory-hungry applications, it's better to keep applications in memory.
37) Message boards : Number crunching : failed upload: can't resolve hostname (Message 47111)
Posted 18 Sep 2013 by Ingleside
Post:
However... I think you have something like 2 weeks before the upload fails. So you can simply sit back & hopefully the project might make this same change at Rutherford.

It's 90 days, if you're not running an ancient BOINC-client like v6.2.xx or something even older.
38) Message boards : Number crunching : Download Errors: Permanent HTTP -- Euro Region Tasks (Message 46671)
Posted 22 Jul 2013 by Ingleside
Post:
It's all caused by a long period timer somewhere in the BOINC server code.
When the timer reaches the maximum value for that variable type, it overflows to zero, and BOINC thinks that it's a new data set, so it tries to issue it.

It's not an overflow, the BOINC server-code includes a security-measure in case the server somehow has overlooked a task. The security-measure kicks-in if a task hits 1.5 times it's deadline and this triggers a re-check verifying if wu is finished or if a new task is neccessary.

Since CPDN isn't archiving "done" wu and removing these from database like other BOINC-projects normally is doing, you'll continue having this problem with CPDN re-issuing ancient wu's until they're hitting any of their max-limits (max error/total).

If not mis-remembers with more resent server-code it's possible to disable the re-issue when hitting the security-limit.
39) Message boards : Number crunching : Download errors: Permanent HTTP error (PNW) (Message 46441)
Posted 18 Jun 2013 by Ingleside
Post:
Just got another of these PNW tasks, (a re-issue) where some of the files didn't download and a permanent HTTP error ensued. What I can't remember is whether or not I need to delete the folder with the model or whether BOINC will do it given time?

Having extra CPDN-folders is only a problem after a model has started. As long as one or more of the input-files is missing, the model never starts, and BOINC-client cleans-up on it's own.

Note, some of the input-files can be marked as "sticky"-files, in case they're used by multiple models. "Sticky"-files is not automatically deleted. Manually deleting such files won't work either, they'll just be tried re-downloaded next time client re-starts. Depending on client-version, they won't be removed by a reset either, but if not mistaken they will be removed on reset if you're running a fairly resent v7-client.
40) Message boards : Number crunching : Workunit error - check skipped (Message 46286)
Posted 24 May 2013 by Ingleside
Post:
(BOINC was designed on the assumption that two different computers processing a task would produce results that are identical, bit for bit. If this were true a simple comparison of results would be a useful check for correct data transmission. But climate models break that assumption.)

Not exactly, BOINC was designed on the assumption the projects would write their own validator, but did include two generic validators, one is the bit-by-bit comparison and the other is "everything validates". Example, the SETI-validator allows 1% variation between most signal-strengths, but at the same time demands the signals is at the same frequency.

Since the validator is project-specific, a CPDN-validator could example check if all trickle-files was reported, all files is uploaded and can also do some other checks on the results. A CPDN-validator doesn't need to compare to other results for wu, meaning no problem with different results.

By running validator & Assimilator, CPDN could also run db_purger, meaning wu's finished would be archieved and removed from database. One obvious advantage here is, finished wu's wouldn't spawn a re-send doomed to fail with download-error 1.5x after the deadline. Another advantage is the database would be kept smaller, and don't need to do the ocassional manual archieving often leading to problems as CPDN has been doing...


Previous 20 · Next 20

©2024 climateprediction.net