climateprediction.net home page
Posts by old_user60427

Posts by old_user60427

1) Message boards : Number crunching : How can I find out whether my model is progressing (Message 40654)
Posted 10 Sep 2010 by Profile old_user60427
Post:
Thanks for the responses. Upon checking my logs I am pleasantly surprised to see that a trickle has uploaded, so the networking issues seem to have overcome (by switching to HTTP 1.0). Progress on the other hand suggests that I restarted to zero. Given the 'speed' of this host and the fact I was at end of phase 3 I have chosen to abort -- computing that last timestep will require at least another 4 month...

Now requesting a 'Famous' (downloading is a bit sluggish; we'll get there eventually).
2) Message boards : Number crunching : How can I find out whether my model is progressing (Message 40647)
Posted 9 Sep 2010 by Profile old_user60427
Post:
One of my current models seems to have developed a problem: no more credits since a few weeks whilst app is getting its fair share of CPU time. I also see progress, but BOINCs own estimate is around 25% whereas I am nearly done with phase 3

Log showed many upload errors (which I initially blamed on server issues). Did some digging yesterday and resolved the 417 errors by instructing BOINC to use HTTP1.0. Downside is that CPDN is no longer trying to upload trickles. A forced update succeeds, but no trickles.

I cannot see graphics on this host, so don't know whether I am really progressing. The logs do not appear to show anything out of the ordinary.
3) Questions and Answers : Unix/Linux : platform 'i586-mandriva-linux-gnu' not found (Message 36941)
Posted 17 May 2009 by Profile old_user60427
Post:
The error message reminds me of this sort of init-script to start boinc (/etc/init.d/boinc): it is looking for boinc manager with a certain name.

The issue is (if my assumption that the error comes from the script is right) that your boinc-manager has a different name than the /etc/init.d/boinc script is looking for.

Go to yr boinc directory and try to start boinc manually -- don't forget that you may need to logon as the user all the boinc software is installed under.
4) Message boards : Number crunching : Boinc version 6 released - comments thread (Message 34740)
Posted 25 Aug 2008 by Profile old_user60427
Post:
Did BOINC 6 also change what climate is logging in boinc.log?

I was used to see output for each individual time step (\"hadsm3fub_00y8_005917643 - PH 2 TS 0221905...\") listing the model-type, phase/time step, simulated date, total CPU and average DT so far. This was the basis for some scripts which kept me posted where CPDN was. I notice this is no longer in the boinclog, since updating to boinc 6.2.15 (from 5.10.28).

Not an issue just curious...
5) Message boards : Number crunching : Whoa, whats happening man...? (Message 29296)
Posted 26 Jun 2007 by Profile old_user60427
Post:
There\'s now a choice of models - the same 160-year models as before or these shorter slab models that some of us had been testing on beta.

Anyway that you can get small models -- one of my PC\'s needs about a year to complete meaning that it will consume all BOINC-time for the last half or so. A shorter model would be desirable
6) Message boards : Number crunching : Visualising results (Message 28011)
Posted 19 Apr 2007 by Profile old_user60427
Post:
I run BOINC on my linux PC as a daemon, i.e. it starts when I boot and it stops during shutdown. For the rest it is out of the way. A small script keeps me up-to-date while logged on. By now I have clocked up 42 years and see that I have a cold WU (see here.

One drawback of running CPDN (and others) as a daemon is that I never see how my earth looks by now. With older I could visualise this with a standalone program; I think I read somewhere that standalone visualisation is no longer possible with the coupled models, but I have not been able to find an absolute statement on that.

Anybody who can either confirm my suspicion or point me in the right direction? Running the model from the boinc client is not (really) an option; boinc runs under a separate user which has a locked down account.
7) Message boards : Number crunching : BOINC Version 5.8.8 Released (Message 27162)
Posted 3 Mar 2007 by Profile old_user60427
Post:
I have a problem with BOINC 5.8.15. Climate prediction continues running at the same time of other projects.

The same happened to me with Win 5.8.13 - although being preempted by other apps and back and forth, suspended and resumed again, it continued to run. Even after stopping Boinc... Probably only missing heartbeat convinced it to terminate some 1/2 minute later.

Peter

I have had the same with the later 4.xx series. The only workaround (though expensive) was to disable \'keep application in memory\' (a boinc setting on your account page). The downside for me was that HADCM doesn\'t write to disk all that often; this means that on average I had to redo half of the disk write interval (which I think worked out to ~10 minutes for me).
8) Message boards : Number crunching : Very large task cannot be handled within given time (Message 25889)
Posted 8 Jan 2007 by Profile old_user60427
Post:

... The deadline is ignored, and the project will accept the result uploads whenever they occur.
...

The only possible issue that may occurr is you crunch for other BOINC projects as well. BOINC will figure out at some point in time that you will miss the deadline; it will then allocate more time fro CPDN in an attempt to make it to the deadline. This may mean that BOINC will not download work from other projects for a long time.
9) Message boards : Number crunching : stash/field code 30320 is unknown to LATS lookup table ... (Message 23891)
Posted 10 Aug 2006 by Profile old_user60427
Post:
Thanks -- thought so.
10) Message boards : Number crunching : stash/field code 30320 is unknown to LATS lookup table ... (Message 23882)
Posted 9 Aug 2006 by Profile old_user60427
Post:
I found the following message in boinc.log when hadmcb3 was collecting all data for the 10th trickle. The full text from the log is:
hadcm3lbm_01m6_05058233 - PH 1 TS 0259201 A - 01/12/1930 00:30 - H:M:S=0228:31:15 AVG= 3.17 DLT= 1.91
file dataout/01m6fo.pjd0c10 is a 32 bit ieee um file 
file dataout/01m6fo.pid0c10 is a 32 bit ieee um file 
file dataout/01m6fo.pfd0c10 is a 32 bit ieee um file 
file dataout/01m6fo.pcd0c10 is a 32 bit ieee um file 
stash/field code 30320 is unknown to LATS lookup table and is not written to output netcdf file. You need to define the field via an external LATS parameter file and PP codes -> LATS conversion table using the -l and -p options.
file dataout/01m6fo.pbd0c10 is a 32 bit ieee um file 
file dataout/01m6fa.phd0c10 is a 32 bit ieee um file 
file dataout/01m6fa.pgd0c10 is a 32 bit ieee um file 
file dataout/01m6fa.ped0c10 is a 32 bit ieee um file 
file dataout/01m6fa.pdd0c10 is a 32 bit ieee um file 
Trickling yearly means for 1930

more lines follow / available on RQ

The trickle went out OK and credit got awarded so I assume all is well. Anybody got a clue what the message means?
11) Message boards : Number crunching : Signature (Message 23280)
Posted 21 Jun 2006 by Profile old_user60427
Post:
use the img tags:

12) Message boards : Number crunching : Trickle model dates for coupled model (Message 22992)
Posted 1 Jun 2006 by Profile old_user60427
Post:
It\'s once per model year, which in the coupled model is 26,920 timesteps (72 timesteps per day)

I am confused. I either find 365 * 72 = 26280 timesteps in a year or maybe 366 * 72 = 26352 (or possibly 365.25 * 72 = 26298). Going through my logs (3 trickles arrived, the 4th one just left my computer) I find a trickle once every 25920 time-steps, which works out to 360 days (the old Roman year).
13) Message boards : Number crunching : You are using the wrong URL for this project (Message 22933)
Posted 27 May 2006 by Profile old_user60427
Post:
Thanks for the quck responses.
14) Message boards : Number crunching : You are using the wrong URL for this project (Message 22906)
Posted 25 May 2006 by Profile old_user60427
Post:
At my last trickle I suddenly got this unexpected set of messages:

2006-05-25 16:25:57 [climateprediction.net] Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
2006-05-25 16:25:57 [climateprediction.net] Reason: To send trickle-up message
2006-05-25 16:25:57 [climateprediction.net] (not requesting new work or reporting completed tasks)
2006-05-25 16:26:08 [climateprediction.net] Scheduler request succeeded
2006-05-25 16:26:08 [climateprediction.net] You are using the wrong URL for this project
2006-05-25 16:26:08 [climateprediction.net] The correct URL is http://climateprediction.net/
2006-05-25 16:26:08 [climateprediction.net] Using the wrong URL can cause problems in some cases.
2006-05-25 16:26:08 [climateprediction.net] When convenient, detach this project, then reattach to http://climateprediction.net/
Creating trickle file trickle_hadcm3lbm_01m6_05058233_1_1923.zip for upload...
hadcm3lbm_01m6_05058233 - PH 1 TS 0078193 A - 07/12/1923 00:30 - H:M:S=0068:13:13 AVG= 3.14 DLT= 2.53
2006-05-25 16:44:28 [climateprediction.net] Pausing task hadcm3lbm_01m6_05058233_1 (left in memory)

I have been crunching for a year or so and this is the first time I see this message. There is a very small chance that I made a typo the last time I mae major changes to my boinc setup, but I doubt that somewhat. Anybody out there who knows background to this error? I am not very keen on detaching and re-attaching, so any advice on the risk of (fool-hardy) persevering with the wrong URL would be greatly appreciated as well.
15) Message boards : Number crunching : Checkpoints for hadcm3 models (Message 22150)
Posted 17 Apr 2006 by Profile old_user60427
Post:
Thanks for the info. Will see how to work around this. Ideally I get the \"Leave applications in memory while preempted?\" thingy to work (it worked like a charm with BOINC 4.19.4). I have just switched to a new boinc version (trux calibrating client based off 5.3.12) and see whether taht swaps teh science apps better.

What happens is is that both apps stay in RAM (OK; I have got enough), but --and here\'s the catch-- both continue to run, so eitehr gets ~45% of CPU and the content switching between CPDN and SETI screws up any caching that is done by CPU.

The logs provide no real clue. Here is a chunk showing what I described:

2006-04-10 15:17:11 [climateprediction.net] Resuming result hadcm3lb_59mn_05033739_1 using hadcm3lb version 508
2006-04-10 15:17:11 [SETI@home] Pausing result 05oc02aa.7805.29506.878390.1.78_2 (left in memory)
Resuming CPDN!
hadcm3lb_59mn_05033739 - PH 1 TS 0006913 A - 07/03/1921 00:30 - H:M:S=0005:58:46 AVG= 3.11 DLT= 1.32
2006-04-10 16:17:11 [climateprediction.net] Pausing result hadcm3lb_59mn_05033739_1 (left in memory)
2006-04-10 16:17:11 [SETI@home] Resuming result 05oc02aa.7805.29506.878390.1.78_2 using setiathome version 470

Note that that I have one line from cpdn in one hour. When the application is not kept in RAM a switch looks like:

2006-04-12 13:01:41 [SETI@home] Computation for result 25se02aa.7060.496.228398.1.105_3 finished
2006-04-12 13:01:41 [climateprediction.net] Resuming result hadcm3lb_59mn_05033739_1 using hadcm3lb version 508
Resuming CPDN!
hadcm3lb_59mn_05033739 - PH 1 TS 0008641 A - 01/04/1921 00:30 - H:M:S=0007:27:34 AVG= 3.11 DLT= 1.43
hadcm3lb_59mn_05033739 - PH 1 TS 0009073 A - 07/04/1921 00:30 - H:M:S=0007:49:35 AVG= 3.11 DLT= 1.59
hadcm3lb_59mn_05033739 - PH 1 TS 0009505 A - 13/04/1921 00:30 - H:M:S=0008:11:33 AVG= 3.10 DLT= 1.37
2006-04-12 14:01:42 [climateprediction.net] Pausing result hadcm3lb_59mn_05033739_1 (removed from memory)
2006-04-12 14:01:42 [SETI@home] Starting result 25se02aa.7060.1265.129826.1.176_0 using setiathome version 470
Cleaning up graphics data...
Detaching shared memory...
2006-04-12 14:01:44 [---] request_reschedule_cpus: process exited

I complete thee times as much ... Will try once the new BOINC version is runnign stable.
16) Message boards : Number crunching : Checkpoints for hadcm3 models (Message 22095)
Posted 16 Apr 2006 by Profile old_user60427
Post:
I know slab models checkpoint every 144 timesteps, so if your model gets halted (system shutdown, boinc suspends it and swaps it out of memory), you lose at most 143 timesteps or ~400 seconds on my systems.

I recently started running a coupled model (). I have noted that this, unlike SUlphur and slab, only writes to the logfile after 432 timesteps (an improvement). Looking at progress in the logs I get the impression that this model also only checkpoints every 432 TS -- I always see the same pattern, i.e. model restarts after the last multiple of 432TS.

hadcm3lb_59mn_05033739 - PH 1 TS 0019009 A - 25/08/1921 00:30 - H:M:S=0016:25:01 AVG= 3.11 DLT= 1.78
2006-04-16 11:39:03 [climateprediction.net] Pausing result hadcm3lb_59mn_05033739_1 (removed from memory)
2006-04-16 11:39:03 [SETI@home] Restarting result 01se99aa.24541.23808.665908.1.255_2 using setiathome version 470

... many lines removed

2006-04-16 15:39:47 [climateprediction.net] Restarting result hadcm3lb_59mn_05033739_1 using hadcm3lb version 508
2006-04-16 15:39:47 [SETI@home] Pausing result 05mr99aa.17103.30529.79836.1.22_2 (removed from memory)
2006-04-16 15:39:48 [---] request_reschedule_cpus: process exited
Beginning work on result hadcm3lb_59mn_05033739_1...
Starting model in /home/boinc/projects/www.climateprediction.net...
Created shared memory region key = 77310 of size 655036 bytes
.so shmem return code = 0
Starting model ID hadcm3lb_59mn_05033739 Phase 1
Climate model starting - use graphics to monitor progress.
Or visit the website to see the graphs for this run.
hadcm3lb_59mn_05033739 - PH 1 TS 0019009 A - 25/08/1921 00:30 - H:M:S=0016:25:01 AVG= 3.11 DLT= 0.00

Anybody who knows the details of this?

I cannot enable \"Leave applications in memory while preempted?\" because I then end up with CPDN and seti both running at the same time; an issue with boinc clients after 4.19 that I have not found a solution for that; will try the new 5.4.x client once it is available
17) Message boards : Number crunching : sulphur model - Linux - Signal 11 (Message 19098)
Posted 9 Jan 2006 by Profile old_user60427
Post:
Same issue here. Running Mandriva 2006.0 Linux. My first sulphur model errored out with sig 11, just after reaching the first trickle (http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=1578019). To add insult to injury I did not get credit for the first trickle either. Is this because the model crashed or because it errored out?

I have \"Leave applications in memory\" set to no, because when having this on yes, boinc would not stop setiathome when switching to CPDN. BOINC Client is 5.2.14 (5.2.13 optimised by crunch3r). Haven\'t had this problem with slab models. I now got a new sulphur and we\'ll see what that gives.
18) Message boards : Number crunching : Current timestep (Message 18811)
Posted 28 Dec 2005 by Profile old_user60427
Post:
Does anyone know where the current timestep is stored? Or even the ts of the last \'write results to disk\'? I thought client_state.xml, but can\'t find it in there.
TIA.

Or if you have it, in boinc.log, see below from my log:

sulphur_hum4_000832828 - PH 1 TS 0002359 A - 20/01/1811 03:30 - H:M:S=0003:09:33 AVG= 4.82 DLT= 2.97

19) Questions and Answers : Unix/Linux : WU terminated at 65% or not? (Message 16834)
Posted 28 Oct 2005 by Profile old_user60427
Post:
It\'s possible to restart a Wu if you\'ve got a backup of the whole BOINC folder made before your problem occured.
If you don\'t have a backup, it\'s \"game over\" for your old Wu, because of the missing infos contained in the client_state.xml file.(now, your client_state.xml have just infos about your new wu, as you have perhaps seen)

Thanks, no backup, so no restart. I have \"inserted a new coin\" and hope that I will complete a WU with the next one. Probably should set up a cron job to make such a backup...

Thanks for the response anyhow.
20) Questions and Answers : Unix/Linux : WU terminated at 65% or not? (Message 16812)
Posted 27 Oct 2005 by Profile old_user60427
Post:
Due to a mistake from my side (never play with permissions on a running computer, especially not recursively), I locked the boinc client and science apps out of their own directories, in other words rw-access to state and log-files was denied. This happened while crunching a S@H WU. Boinc tried to restart this WU twice by re-downloading it (but failed as no write access) and then gave up. In parallel BOINC was trying to start CPDN. This obviously did not happen either, and by the looks of my WU got aborted. BOINC here tried to download a new one once (which also did not work).
Unfortunately I then compounded the problem by correcting the permissions problem (leave it to humans to really screw up), and only then stopping/restarting boinc. The nett effect of this is that I now got a new CPDN WU, have all the data of the other WU (PH 3, i trickle). The program I use to interpret the boinc state files lists the oldest CPDN WU as \'completed/uploaded\' and boinc does not switch to it, but rather starts the new WU.

Is the oldest WU truely borked, i.e. boinc will not start crunching it? I scanned the client_state.xml files (unsuccesfully) for a clue.
Is it possible to restart CPDN on the oldest WU by going back to a check-point? How?


Next 20

©2024 climateprediction.net