climateprediction.net home page
Posts by Ingleside

Posts by Ingleside

11) Message boards : Number crunching : no credit awarded? (Message 68549)
Posted 3 Mar 2023 by Ingleside
Post:
The original system had two scripts - one to copy the trickles to a place where they could be seen on the website

This script, or whatever was supposed to replace this script, clearly isn't working as seen with the "No trickle!" on website.

Based on the 11. August 2022 batch of WAH2 work, since trickles did work in August but not in December (then original issue errored-out), it doesn't look like any mis-configuration of the actual wu.
Instead, some possibilities includes:
1: Trickle script can't copy to directory, due to accidentally write-protected directory or directory physically full or "full quota" or accidentally lost access rights.
2: The ini-file responsible for where trickle-script should copy trickles was changed to point to new directory, but neither web-pages or credit-script was updated to new directory.
3: Trickle-script stuck on a specific trickle and even if re-started get stuck on the same "bad" trickle.
4: Updated or re-configured BOINC server and "forget" to extract trickle information from scheduler, or extract to "wrong" directory from where trickle script expect.
5: Since apparently OpenIFS does not rely on trickles for crediting, incorrectly assumed didn't need to copy trickles any longer.

Note, chances are then the problem with trickles not showing-up on web-page is fixed the credit will also be fixed on next credit run (unless overlooked the example where "recent" trickles does show on web-page but still no credit).
12) Message boards : Number crunching : no credit awarded? (Message 68542)
Posted 2 Mar 2023 by Ingleside
Post:
it's affecting the Hadley models on all hosts

Not just Hadleys, it also affects WAH2 models running on Windows, example https://www.cpdn.org/result.php?resultid=22250721
from December 2022. As is common, it says "No trickles!".
13) Message boards : Number crunching : Download server (Message 58800)
Posted 23 Sep 2018 by Ingleside
Post:
The count decreases then it's assigned to you, since obviously the same task can't be issued to someone else. Also the scheduling-server doesn't know, and doesn't care, if the download-server is working or not.
14) Message boards : Number crunching : Africa v7.22 Errors (Message 51317)
Posted 26 Jan 2015 by Ingleside
Post:
Hmm, does the Africa-application have the same fatal bug as the PNW-application, with landing in the "crashes 100 times before giving-up", or is it something else going-on with just some bad wu's?
15) Message boards : Number crunching : Main climateprediction.net web page down (Message 51131)
Posted 5 Jan 2015 by Ingleside
Post:
"Attaching isn't a problem"
Oops
for users already attached, who have an account_* file somewhere, yes "Attaching isn't a problem"

But, for new clients, who don't have such, yes it is a problem.

Makes it hard to recruit new (and old) users when they need an "account_*" file on their system already to sign up -

Maybe some kind of magic redirection from the public sites -- I have no clue.

New users can't sign up right now -- is that correct?


Well, I can't at the moment test if it's possible to make a new account, since I'm stuck at a Linux-computer with no possibility to run BOINC...

For everyone that already has an acccount, manages to find these pages and also manages to login their own account on these web-pages, it's fairly easy to make an account-file if for some reason it's not possible to just attach to http://climateapps2.oerc.ox.ac.uk/cpdnboinc/ by manually typing this into BOINC Manager...

Goto http://climateapps2.oerc.ox.ac.uk/cpdnboinc/weak_auth.php and use notepad (or similar) to make, as marked in bold, a file called account_climateprediction.net.xml in your BOINC data-directory.

For the contents, at the moment you'll need to use http://climateapps2.oerc.ox.ac.uk/cpdnboinc/ for PROJECT_URL and for WEAK_ACCOUNT_KEY you can use either the weak account-key listed on the web-page or the "normal" account-key listed on the "Your account"-page.


After successfully editing and saving account_climateprediction.net.xml, just re-start BOINC.
16) Message boards : Number crunching : Main climateprediction.net web page down (Message 51118)
Posted 4 Jan 2015 by Ingleside
Post:
Has anyone else been able to attach since the the front page went offline? If so how?

Attaching isn't a problem, you can use http://climateapps2.oerc.ox.ac.uk/cpdnboinc/ to attach since they've also including the link to scheduler on this page. While the BOINC-client will complain about using this address it will still work without any problems. I've for many years been attached to the "wrong" url for a project without having any problems appart for getting the "wrong address" on scheduler-requests.

For anyone already attached but is stuck with needing to re-download master-page, a work-around is to edit the account_climateprediction.net.xml-file in the BOINC data-directory to use http://climateapps2.oerc.ox.ac.uk/cpdnboinc/ as the master-url (the 2nd. line in the file) and afterwards re-start the BOINC client.

Then the homepage is finally back, it is just to re-edit the account-file to use http://climateprediction.net/ as master_url. This can be done even if you attached to the "wrong" url originally.


BTW, while attaching isn't difficult to do, I immediately hit the Linux-bug of "missing libstdc++.so.6", a library so ancient it's not part of the repositories any longer meaning I've not got the foggiest idea how to install it.
17) Message boards : Number crunching : Compute Errors on Pacific North West v7.22 Tasks (Message 49593)
Posted 18 Jul 2014 by Ingleside
Post:
Last PNW to come to my machine was on 12th Feb this year. It completed.

Ok, I forgot to specify it's all the Windows-PNW-tasks crapping-out, under different OS like Linux this batch is possibly worse since this time it's an input-file-error while not sure on the source of error for the "no heartbeat"-tasks.
18) Message boards : Number crunching : Compute Errors on Pacific North West v7.22 Tasks (Message 49590)
Posted 18 Jul 2014 by Ingleside
Post:

That's because of the INITTIME error, as mentioned a few posts down.

All PNW-models now crapping-out after 30 seconds or something with a INITTIME-error is a huge improvement since the previous batches...

... since these ran-through 100 re-starts due to "no heartbeat" before crapping-out and as a "bonus" left-behind around 300 MB of garbage on the hd.

Frankly, AFAIK PNW haven't worked since the upgrade to 7.22, a version AFAIK not even beta-tested before release so I've no idea why CPDN continues releasing new PNW-garbage before they've even tried to get it working as beta.



19) Message boards : Number crunching : Project keeps resetting - any explanations? (Message 49371)
Posted 16 Jun 2014 by Ingleside
Post:
Anyway, the disk value has been typical for this machine for quite sometime, which is why I thought it was "normal." So -- I have one last question: Is it easiest to simply try a project "reset" to clean out the directory? I would dislike damaging the directory structure the way I go about wiping folders and disks... Thanks!

Reset should work.
20) Questions and Answers : Macintosh : GB added to my Time Machine backup (Message 48959)
Posted 29 Apr 2014 by Ingleside
Post:
2) Normally exclude the boinc directory tree from time capsule backups. Once a week, remove this exclusion, select "backup now" and after the backup is done, re-exclude the boinc directory tree for another week.

3) Same as 2) except suspend/shutdown boinc while the backup is taking place. (To satisfy my paranoia about doing a backup while boinc is running. )

4) Once a week suspend/shutdown boinc, copy the directory tree to a backup disk, restart boinc. The only problem of this option, is remembering to do it, and waiting around while 14 GB is being copied to the (relatively slow) backup disk so boinc activity can be resumed.

I tend to lean toward 4).

Well for hadam3p_eu-models it's a waste of time to do weekly backups, since chances are any restored backup will be of models you've already finished & reported. For hadam3p_anz the usefulness of a weekly backup is also limited so except for hadcm3n a weekly backup is mostly useless. (No idea with Moses).

A daily backup on the other hand would be much more useful. If the "Time Machine" is up to the task, I would choose option #5:

5: Exclude boinc from hourly backup. Make a separate backup-profile for BOINC, doing a daily backup of only the BOINC data-directory (including sub-directories).

If time machine can't handle #5, option #2 but done once-a-day is probably the best.
21) Questions and Answers : Wish list : Enhance scheduling/throttling strategies (Message 48958)
Posted 29 Apr 2014 by Ingleside
Post:
If you edits the BOINC-preferences locally on a computer, it's saved in a file called global_prefs_override.xml located in the BOINC data-directory.
In addition BOINC includes boinccmd located in the BOINC application-directory, this is a command-line tool to give various commands to a running BOINC-client, including reading global_prefs_override.xml

So while BOINC doesn't directly support changing %cpu-usage due to time-of-day, one method that should work is to make two small batch-files, and these batch-files you can schedule to run at a particular time in Windows.

To make the batch-files, you'll first need to make the BOINC-preferences. This can example be to make the "full"-preferences, copy global_prefs_override.xml and calling it full.xml, re-edit BOINC-preferences choosing the "low"-preference and copy global_prefs_override.xml calling it low.xml

The full-batch-file can be something like:
copy "your-boinc-data-dir\full.xml" "your-boinc-data-dir\global_prefs_override.xml"
"your-boinc-app-dir\boinccmd" --read_global_prefs_override

And the half-batch-file can be something like:
copy "your-boinc-data-dir\low.xml" "your-boinc-data-dir\global_prefs_override.xml"
"your-boinc-app-dir\boinccmd" --read_global_prefs_override

You'll need access-rights for the copying to work.
22) Message boards : Number crunching : CONVERTING TO LINUX (Message 48944)
Posted 28 Apr 2014 by Ingleside
Post:
That's a thought, but with dual boot, Jim can just switch to Windows when cpdn is out of work and run other projects with that.

Well, atleast in my experience, a project going down always happens at the wrong time, example 15 minutes before you're increasing the cache-size from 0.1 days to 5 days, or 5 minutes after leaving in the morning so computer can sit idle for many hours during the day if don't have a backup-project.

Even with "lots of work" cached for CPDN, how many bad batches has CPDN released over the years, crashing after a few seconds in a run? Not to forget, while all Hadam3p-variants for years was probably the most stable models, after PNW released a new version (AFAIK not even bothered beta-tested) all PNW-models has in my experience had a 100% error-rate. (and as a bonus these crashes has filled-up the hd leading to also the other models crapping-out due to no free disk space).

23) Message boards : Number crunching : CONVERTING TO LINUX (Message 48931)
Posted 28 Apr 2014 by Ingleside
Post:
I agree with Les: Mint is best if your Linux install is going to be used for various things. But if your box is dedicated to CPDN work (or BOINC work in general), I'd recommend 32 bit Lubuntu. Lubuntu is Ubuntu with the 'LXDE' desktop, which is very light on system resources, so it leaves more CPU for CPDN to use.

If you're only going to run CPDN under Linux using a 32-bit version is possibly the best option. But if you're also expecting other projects will be run, either due to your choise or as a backup for next time CPDN is out of work or has server-problems and can't get any new work, a 32-bit Linux doesn't look like a good choise, since for some BOINC-projects the 64-bit-applications has a significant speed-advantage.


24) Questions and Answers : Wish list : Using GPUs for number crunching (Message 48739)
Posted 8 Apr 2014 by Ingleside
Post:
Assuming no major problems with the compiler the next step is professional development of your programming staff with training on GPUs. At this point you might be in a position to know if GPU processing is a reasonable option.

Well, AFAIK all currently active climate-models uses SSE2-optimizations, and my guess this means they're using double-precision. Since the fortran-compiler linked a few posts back is CUDA, and Nvidia-cards has abyssimally poor double-precision-speed of only 1/24 single-precision-performance, except if you pays $$$$ for the professional cards, even a top-end Nvidia-GTX780Ti only manages 210 GFLOPS at most. A quad-core (8 with HT) cpu on the other hand is around 100 GFLOPS. Meaning even best-case the Nvidia-GPU will only be 2x faster than CPU. In reality even 50% performance on GPU can be too high, meaning your "slow" CPU is outperforming your "fast" GPU.

So, unless can use single-precision on most of the calculations, a CUDA-version of CPDN is a waste of development-time.

Instead of CUDA, an OpenCL-compiler would be more interesting, since OpenCL also works with the much faster Amd-GPU's. But even with this additional speed, it's still unlikely can get a climate-model to run faster on GPU than CPU.
25) Message boards : Number crunching : No Tasks Available (Message 48299)
Posted 5 Mar 2014 by Ingleside
Post:
There was a nasty bug in v7.2.39 which could be causing the download errors.

Haven't been keeping-up with client-changes recently so wasn't aware of this.

v7.2.42 is a bug-fix - maybe the ones you're seeing with download errors and v7.2.42 have upgraded the client in the meantime? Can you check whether the individual failed tasks were attempted under v7.2.39, whatever client they're running now?

Did only find one of my wu's having someone reporting as v7.2.42 with download-error, it's host 1289490 and a quick look reveals 7 errors at the same time. Interestingly enough they're reported as "error" and not "download error". Also, it's only 6 minutes between being assigned and reported as error, so clearly someone manually hitting "update". While it's possible they did swap BOINC-client during these 6 minutes before reporting the errors, this info isn't available anywhere so...
26) Message boards : Number crunching : No Tasks Available (Message 48285)
Posted 5 Mar 2014 by Ingleside
Post:
The permanent http error is only happening to a few people, so is most likely a problem with their computer.

Well, taking a look on the wu's I've downloaded, while I've not had any download-errors myself the current results are:
90 wu's downloaded, of these:
38 error-free (atleast for now).
39 wu's with download-errors.
21 wu's with computing-errors.
48 total download-errors.
27 total computing-errors.
3 wu's errored-out due to too many errors.

43% of the wu's having download-errors is in my opinion too high, so even if only a "few" users has problems they're managing to generate lots of errors. Since atleast some of these users seems to have no problems crunching other BOINC-projects, it's a little strange if where's a problem with their computers.

Now I've not checked every download-error, but atleast the checked on was from users running BOINC-version 7.2.39 or 7.2.42. If this indicates either a problem with current BOINC-clients or CPDN's server-setup I've no idea about, it can also just be all errors didn't check is from different BOINC-versions.

BTW, appart for all the download-errors, 23% of wu's generating atleast one computing-errors seems on the high side to me.
27) Message boards : climateprediction.net Science : New project launch tomorrow: Weather@home 2014: the causes of the UK winter floods (Message 48270)
Posted 4 Mar 2014 by Ingleside
Post:
Welcome to the forums. :)

edit - seems the homepage now has been fixed.
28) Message boards : Number crunching : No Tasks Available (Message 48269)
Posted 4 Mar 2014 by Ingleside
Post:
Not aware of any download-errors, but had 4 models crashing-out with the following message:
<stderr_txt>

Model crashed: INITTIME: Atmosphere basis time mismatch                                                                                                                                                                                                                        tmp/xaakm.pipe_dummy                                                            2048    
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>

The wu's are 8683247, 8683249, 8683250 and 8683251.

On the same computer some of the other models had already been running for a few hours, and another model started successfully a few seconds after the 4 crashing ones. No idea if any other problems, since no way to know how many of the models has started crunching (no access from here).
29) Message boards : Number crunching : Microsoft Visual C++ Runtime Error (Message 48267)
Posted 4 Mar 2014 by Ingleside
Post:
It's a problem that's been cropping up for a couple years, and no definitive answer or cure has been found in all of that time.
All that can be done is to offer sympathies, and hopes that the next one will work OK.

Atleast my experience (under windows) is if you installs BOINC as a service it means the popup-message can't be shown, so the model will just silently crash-out on it's own.

The disadvantage is you can't install BOINC as a service if you're also doing GPU-crunching.

So for anyone not doing GPU-crunching, installing as a service won't fix some CPDN-models crapping-out, but it should atleast happen without the spamming popup-messages.



30) Message boards : Number crunching : VANISHING WU'S (Message 48061)
Posted 27 Jan 2014 by Ingleside
Post:
Every night at midnight your time you'll still have your work quota put back to one model per core per day.

Uhm, in older BOINC server-code it was midnight server-time, not user-time, so for CPDN this would equal midnight GMT in the winter.

Since having all quota-limited computers connecting the hour after midnight server-time gave an extra spike in server-load, in more resent server-code the "midnight" is instead randomly assigned to individual computers, meaning someone with multiple computers can have one computer getting a new quota at 01:23:45, another at 12:33:44, a third at 05:43:21 and so on. I'm not sure if CPDN has resent-enough code to have this functionality or the older midnight-server-time-code...





Previous 20 · Next 20

©2024 climateprediction.net