climateprediction.net home page

Posts by Ingleside

1) Message boards : Number crunching : Compute Errors on Pacific North West v7.22 Tasks (Message 49593)
Posted 46 days ago by Ingleside
Last PNW to come to my machine was on 12th Feb this year. It completed.

Ok, I forgot to specify it's all the Windows-PNW-tasks crapping-out, under different OS like Linux this batch is possibly worse since this time it's an input-file-error while not sure on the source of error for the "no heartbeat"-tasks.
2) Message boards : Number crunching : Compute Errors on Pacific North West v7.22 Tasks (Message 49590)
Posted 46 days ago by Ingleside

That's because of the INITTIME error, as mentioned a few posts down.

All PNW-models now crapping-out after 30 seconds or something with a INITTIME-error is a huge improvement since the previous batches...

... since these ran-through 100 re-starts due to "no heartbeat" before crapping-out and as a "bonus" left-behind around 300 MB of garbage on the hd.

Frankly, AFAIK PNW haven't worked since the upgrade to 7.22, a version AFAIK not even beta-tested before release so I've no idea why CPDN continues releasing new PNW-garbage before they've even tried to get it working as beta.



3) Message boards : Number crunching : Project keeps resetting - any explanations? (Message 49371)
Posted 78 days ago by Ingleside
Anyway, the disk value has been typical for this machine for quite sometime, which is why I thought it was "normal." So -- I have one last question: Is it easiest to simply try a project "reset" to clean out the directory? I would dislike damaging the directory structure the way I go about wiping folders and disks... Thanks!

Reset should work.
4) Questions and Answers : Macintosh : GB added to my Time Machine backup (Message 48959)
Posted 126 days ago by Ingleside
2) Normally exclude the boinc directory tree from time capsule backups. Once a week, remove this exclusion, select "backup now" and after the backup is done, re-exclude the boinc directory tree for another week.

3) Same as 2) except suspend/shutdown boinc while the backup is taking place. (To satisfy my paranoia about doing a backup while boinc is running. )

4) Once a week suspend/shutdown boinc, copy the directory tree to a backup disk, restart boinc. The only problem of this option, is remembering to do it, and waiting around while 14 GB is being copied to the (relatively slow) backup disk so boinc activity can be resumed.

I tend to lean toward 4).

Well for hadam3p_eu-models it's a waste of time to do weekly backups, since chances are any restored backup will be of models you've already finished & reported. For hadam3p_anz the usefulness of a weekly backup is also limited so except for hadcm3n a weekly backup is mostly useless. (No idea with Moses).

A daily backup on the other hand would be much more useful. If the "Time Machine" is up to the task, I would choose option #5:

5: Exclude boinc from hourly backup. Make a separate backup-profile for BOINC, doing a daily backup of only the BOINC data-directory (including sub-directories).

If time machine can't handle #5, option #2 but done once-a-day is probably the best.
5) Questions and Answers : Wish list : Enhance scheduling/throttling strategies (Message 48958)
Posted 126 days ago by Ingleside
If you edits the BOINC-preferences locally on a computer, it's saved in a file called global_prefs_override.xml located in the BOINC data-directory.
In addition BOINC includes boinccmd located in the BOINC application-directory, this is a command-line tool to give various commands to a running BOINC-client, including reading global_prefs_override.xml

So while BOINC doesn't directly support changing %cpu-usage due to time-of-day, one method that should work is to make two small batch-files, and these batch-files you can schedule to run at a particular time in Windows.

To make the batch-files, you'll first need to make the BOINC-preferences. This can example be to make the "full"-preferences, copy global_prefs_override.xml and calling it full.xml, re-edit BOINC-preferences choosing the "low"-preference and copy global_prefs_override.xml calling it low.xml

The full-batch-file can be something like:
copy "your-boinc-data-dir\full.xml" "your-boinc-data-dir\global_prefs_override.xml"
"your-boinc-app-dir\boinccmd" --read_global_prefs_override

And the half-batch-file can be something like:
copy "your-boinc-data-dir\low.xml" "your-boinc-data-dir\global_prefs_override.xml"
"your-boinc-app-dir\boinccmd" --read_global_prefs_override

You'll need access-rights for the copying to work.
6) Message boards : Number crunching : CONVERTING TO LINUX (Message 48944)
Posted 127 days ago by Ingleside
That's a thought, but with dual boot, Jim can just switch to Windows when cpdn is out of work and run other projects with that.

Well, atleast in my experience, a project going down always happens at the wrong time, example 15 minutes before you're increasing the cache-size from 0.1 days to 5 days, or 5 minutes after leaving in the morning so computer can sit idle for many hours during the day if don't have a backup-project.

Even with "lots of work" cached for CPDN, how many bad batches has CPDN released over the years, crashing after a few seconds in a run? Not to forget, while all Hadam3p-variants for years was probably the most stable models, after PNW released a new version (AFAIK not even bothered beta-tested) all PNW-models has in my experience had a 100% error-rate. (and as a bonus these crashes has filled-up the hd leading to also the other models crapping-out due to no free disk space).

7) Message boards : Number crunching : CONVERTING TO LINUX (Message 48931)
Posted 128 days ago by Ingleside
I agree with Les: Mint is best if your Linux install is going to be used for various things. But if your box is dedicated to CPDN work (or BOINC work in general), I'd recommend 32 bit Lubuntu. Lubuntu is Ubuntu with the 'LXDE' desktop, which is very light on system resources, so it leaves more CPU for CPDN to use.

If you're only going to run CPDN under Linux using a 32-bit version is possibly the best option. But if you're also expecting other projects will be run, either due to your choise or as a backup for next time CPDN is out of work or has server-problems and can't get any new work, a 32-bit Linux doesn't look like a good choise, since for some BOINC-projects the 64-bit-applications has a significant speed-advantage.


8) Questions and Answers : Wish list : Using GPUs for number crunching (Message 48739)
Posted 147 days ago by Ingleside
Assuming no major problems with the compiler the next step is professional development of your programming staff with training on GPUs. At this point you might be in a position to know if GPU processing is a reasonable option.

Well, AFAIK all currently active climate-models uses SSE2-optimizations, and my guess this means they're using double-precision. Since the fortran-compiler linked a few posts back is CUDA, and Nvidia-cards has abyssimally poor double-precision-speed of only 1/24 single-precision-performance, except if you pays $$$$ for the professional cards, even a top-end Nvidia-GTX780Ti only manages 210 GFLOPS at most. A quad-core (8 with HT) cpu on the other hand is around 100 GFLOPS. Meaning even best-case the Nvidia-GPU will only be 2x faster than CPU. In reality even 50% performance on GPU can be too high, meaning your "slow" CPU is outperforming your "fast" GPU.

So, unless can use single-precision on most of the calculations, a CUDA-version of CPDN is a waste of development-time.

Instead of CUDA, an OpenCL-compiler would be more interesting, since OpenCL also works with the much faster Amd-GPU's. But even with this additional speed, it's still unlikely can get a climate-model to run faster on GPU than CPU.
9) Message boards : Number crunching : No Tasks Available (Message 48299)
Posted 181 days ago by Ingleside
There was a nasty bug in v7.2.39 which could be causing the download errors.

Haven't been keeping-up with client-changes recently so wasn't aware of this.

v7.2.42 is a bug-fix - maybe the ones you're seeing with download errors and v7.2.42 have upgraded the client in the meantime? Can you check whether the individual failed tasks were attempted under v7.2.39, whatever client they're running now?

Did only find one of my wu's having someone reporting as v7.2.42 with download-error, it's host 1289490 and a quick look reveals 7 errors at the same time. Interestingly enough they're reported as "error" and not "download error". Also, it's only 6 minutes between being assigned and reported as error, so clearly someone manually hitting "update". While it's possible they did swap BOINC-client during these 6 minutes before reporting the errors, this info isn't available anywhere so...
10) Message boards : Number crunching : No Tasks Available (Message 48285)
Posted 181 days ago by Ingleside
The permanent http error is only happening to a few people, so is most likely a problem with their computer.

Well, taking a look on the wu's I've downloaded, while I've not had any download-errors myself the current results are:
90 wu's downloaded, of these:
38 error-free (atleast for now).
39 wu's with download-errors.
21 wu's with computing-errors.
48 total download-errors.
27 total computing-errors.
3 wu's errored-out due to too many errors.

43% of the wu's having download-errors is in my opinion too high, so even if only a "few" users has problems they're managing to generate lots of errors. Since atleast some of these users seems to have no problems crunching other BOINC-projects, it's a little strange if where's a problem with their computers.

Now I've not checked every download-error, but atleast the checked on was from users running BOINC-version 7.2.39 or 7.2.42. If this indicates either a problem with current BOINC-clients or CPDN's server-setup I've no idea about, it can also just be all errors didn't check is from different BOINC-versions.

BTW, appart for all the download-errors, 23% of wu's generating atleast one computing-errors seems on the high side to me.


Next 10 posts



Copyright © 2002-2014 climateprediction.net