climateprediction.net home page
Posts by Thunder

Posts by Thunder

1) Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion *** (Message 54270)
Posted 8 Jun 2016 by Thunder
Post:
Sorry Geophi, but I already detached the project from that machine, so the executables are gone. I may check back once in a while to see if they're going to ever embrace these new fangled 64-bit computers. *sigh*

I do know that libz.so.1 was one of the libraries that I specifically located and installed the package that contained it, so the solution is likely more complicated than that.

I'd like to help further, but even though CPDN is my first (and longest) BOINC project, I'm just too frustrated with it to continue. I'll leave it going as a low-priority project on a handful of windows boxes that seem to be stable enough that they won't trash 3/4 of their models, but beyond that, I'm done.
2) Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion *** (Message 54250)
Posted 4 Jun 2016 by Thunder
Post:
Well, I can't say I didn't try, but I'm finally giving up running CPDN on 64-bit linux (or at least the 3 different versions/flavors of Ubuntu I've tried).

I've spent 3 months trying every suggestion, systematically working down this thread and every page it links to, installing so many additional libraries that I've utterly lost count, but to no avail.

The best I've gotten is that all models make it through until they try to generate the final upload files and then fail. I highly doubt it's a permissions issue because it's a standard package installation and all other projects work fine.

In any case, 15,000 hours of CPU time wasted is enough trying for me. :-P

3) Message boards : Number crunching : CPDN and BOINC Stats (Message 54169)
Posted 22 May 2016 by Thunder
Post:
Looks like there wasn't a run this week either (considering the files are all still dated 5-8). :-P

Any book makers putting odds on how many weeks *this* problem will take to fix?
4) Message boards : Number crunching : Total Credit (Message 53812)
Posted 25 Mar 2016 by Thunder
Post:
Pffft, Les, I've lost 3.3 million, so I'M winning!
5) Message boards : Number crunching : Total Credit (Message 53788)
Posted 24 Mar 2016 by Thunder
Post:
I would say that's exactly what I'm seeing. The bulk of my work would have been from 2004-2012. I kind of got out of actively participating in BOINC projects for a few years and then returned about a month ago.

Boy, my BOINCstats information is going to look very whacky for this month. ;-)
6) Message boards : Number crunching : Total Credit (Message 53778)
Posted 23 Mar 2016 by Thunder
Post:
Well, something changed because my total credit is now the amount is was um... probably 8 or 9 years ago. So... confusing, but at least something changed. O.o
7) Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion *** (Message 53648)
Posted 16 Mar 2016 by Thunder
Post:
I'm living proof that even a blind dog finds a bone once in a while. :)
8) Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion *** (Message 53646)
Posted 15 Mar 2016 by Thunder
Post:
Pardon if this is duplicating previous information, but it's been years since I last ran CPDN on Linux.

This morning, when I noticed there was apparently work for Linux machines available, I went ahead and attached one machine. It's a relatively fresh Ubuntu 14.04 LTS install.

Noticed (at lunch) from stderr that 3 models had errored out instantly with "../../projects/climateprediction.net/wah2_8.12_i686-pc-linux-gnu: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory"

I found the package that contains it is lib32stdc++6 and installed it. (I took a gamble and guessed that the 32-bit library was needing since the app was identified as for x86 rather than 64)

Obviously I'm a long way from proof that all is working, but the most recent task is ~24 minutes in and chugging along fine so far.
9) Questions and Answers : Windows : Trickles not being reported for one model (Message 34210)
Posted 2 Jul 2008 by Thunder
Post:
That seems as conclusive as can be Thunder!

Under the circumstances the best thing you can do is abort the model, but it would be really helpful if you could backup your projects/climateprediction.net and slots directories first in case the project team want a copy to investigate why the model is behaving this way.

I\'ve sent a PM to Worldwidewog to pass on the bad news.


Thanks for the assistance and advice. I probably would have scratched my head for a while without it. :)
10) Questions and Answers : Windows : Trickles not being reported for one model (Message 34208)
Posted 1 Jul 2008 by Thunder
Post:
And for the final piece of evidence, you\'ll note that the only other computer running a task from the same workunit is stuck at exactly the same timestep and hasn\'t trickled for about one month, despite having contacted the server in the last couple hours.

WU = 6153695

Other poor crucher schlepping the same data over and over: Worldwidewog

I\'m going to hold off on aborting this model until I find out for sure that there\'s no useful info that the project gurus would find useful stored on the client. Can someone with more knowledge than me let me know for sure when is the right time to kill it?
11) Questions and Answers : Windows : Trickles not being reported for one model (Message 34207)
Posted 1 Jul 2008 by Thunder
Post:
Of course, I just noticed something that\'s pretty obviously \"not right\"...

It\'s been sending trickles all right... Heh... It\'s been trying to send a trickle approximately once every hour of computational time since... oh, the last 2 and a half months or so. :O

Thyme, I think you\'ve hit the likely scenario. It\'s in a loop that\'s going back to prior to the last (checkpoint? trickle point?) and then crossing it again and again and again. Looks like around 12-13 hundred times if I were to do some rough math.

It\'s showing 1439 hours of computation time and just by some really rough math, I don\'t think it should take more than 8-900 hours to complete a slab model, even running hyperthreaded.

Think I should abort this model?
12) Questions and Answers : Windows : Trickles not being reported for one model (Message 34206)
Posted 1 Jul 2008 by Thunder
Post:
I\'m fairly sure this is a problem with the CPDN database and not an issue with the client.

Open up the graphics window and type \'Z\' to hide the sidebar and \'8\' to display the timestep. What phase and timestep number are shown? If it\'s anything less than phase 3 and timestep 75,614 the model has rewound and the server is ignoring your trickles because they\'ve already been received.

If you can\'t run the graphics have a look at the file projects/climateprediction.net/hadsm3fub_0169_005941516.xml instead. The phase number and timestep at the last checkpoint are in <PH> and <TS> tags.


Sure, here\'s a copy straight from it:

<V>520</V> 
  <MD>HADSM3</MD> 
  <N>hadsm3fub_0169_005941516</N> 
  <PH>3</PH> 
  <TS>79311</TS> 
  <DAY>3</DAY> 
  <MTH>7</MTH> 
  <YR>2055</YR> 
  <HR>7</HR> 
  <MIN>30</MIN> 
  <SEC>0</SEC> 

13) Questions and Answers : Windows : Trickles not being reported for one model (Message 34204)
Posted 1 Jul 2008 by Thunder
Post:
Another possibility, which doesn\'t seem to apply, is that there has been a new computer ID issued, which is usually caused by using a backup. In this case, the trickles will be logged on the \'old\' (original), ID. But you don\'t have another appearance of that computer.

The only other thing that I can think of, is that you created a new account at about that time, and the trickles since then have been going to the new account.
As we haven\'t a clue about the ID of any such account, it would be up to you to find it.

edit
If there is a possiblity of a second account, then a way to look for it would be:
On the computer in question, use Notepad to open client_state.xml
Use Find to look for <project>
Check the next couple of lines to see if they both mention this project name (climateprediction.net)
Otherwise, do Find next

If it\'s the right project, a few lines below will be: <hostid>
compare the number with the one in this thread, just below your name, to the left of the posts.


Well, I checked and the <hostid> shown in client_state.xml is still 221046. Another strong indication that this is not a problem with hostid or userid is that the \"Last Contact\" column on the site updates each time the computer sends a trickle. For example, it presently reads: 1 Jul 2008 16:00:29 UTC and the message log of the client states: climateprediction.net 7/1/2008 11:59:36 AM Scheduler request succeeded: got 0 new tasks

Other than telling me that our clocks are about 1 minute off, that pretty much tells me that the client is communicating with CPDN on the correct hostid.

I\'m fairly sure this is a problem with the CPDN database and not an issue with the client.

14) Questions and Answers : Windows : Trickles not being reported for one model (Message 34199)
Posted 30 Jun 2008 by Thunder
Post:
This model run:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=7384116

has apparently not reported any trickles since 23 Apr 08, yet the client thinks it\'s sending trickles just fine.

As recently as ~10 minutes ago, it sent another:

climateprediction.net 6/30/2008 5:03:42 PM Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
climateprediction.net 6/30/2008 5:03:47 PM Scheduler request succeeded: got 0 new tasks

(the preceding is from boincview, so it doesn\'t look precisely like the format from the boinc client)

In all other respects the client appears to be running fine. Other projects are humming along, model appears to be crunching, etc.

Any idea what\'s going on?
15) Message boards : Cafe CPDN : Milestones Thread (Message 31927)
Posted 31 Dec 2007 by Thunder
Post:
Looks like I nearly missed this by a day, but I finally crossed the 1 million cobblestone mark for CPDN. :) Of course, that\'s hundreds of runs and lots of \'boo-boos\' thrown in, but I think I have over 50 complete between slab and coupled.
16) Questions and Answers : Windows : Model stops, cpu goes idle (Message 29914)
Posted 9 Aug 2007 by Thunder
Post:
Well, unfortunately, when I checked on the model this morning, the exact same thing had happened.

Using boinccmd --quit did not work. The CPDN process remained loaded so I had to terminate it again. In fact, the boinc service continued running, so I don\'t think using the command line is necessarily the right thing for a service install, but I\'m not positive about that.
17) Questions and Answers : Windows : Model stops, cpu goes idle (Message 29896)
Posted 8 Aug 2007 by Thunder
Post:
boinccmd --quit

and if that doesn\'t work, then reboot


Thanks for the tip Mike, it should have occurred to me to try the command line. I\'ll use that in the future if any other oddity occurs.

Thunder\'s model seems to have lost about 7 or 8 hours of forward crunching/progress yesterday (7 Aug) but has since then trickled twice at about the expected interval. What the problem really consisted of is still a mystery to me.


Agreed, mo.v... I don\'t think I\'m anywhere closer to understanding what caused this, but at least I\'ve learned a few things. The GOOD news (despite the lack of a \'resolution\') is that the model has run for nearly 24 hours and appears quite normal now. It\'s suspended/resumed 4 times without so much as a hiccup.

If it has any other issues, I\'ll be sure to post here, but I\'m going to assume this was just due to some randomness (cosmic ray hit flipped a bit somewhere?) and press on. :) Thanks for the good info that everyone has provided so far! :)
18) Questions and Answers : Windows : Model stops, cpu goes idle (Message 29890)
Posted 7 Aug 2007 by Thunder
Post:
1) In your first post, when you said the model had stopped twice but boinc said it was running although the CPU time and % done didn\'t change, did the model graphics display? (If the globe did display it means the model actually was running.) I suspect that as you said the CPU was idle, the model had stopped running. I presume you mean the CPU % graph display in Task manager?


I knew I should have explained more... It\'s a service installation, so I don\'t use the screensaver. I monitor it through BOINCview and rarely actually visit the machine itself. Hence, no graphics ever display. In fact, I don\'t *think* the graphics display process even runs on it (I can\'t remember ever looking for it specifically). In any case, I used windows task manager to see that the model process was still resident, still using some memory, but 0% CPU.

2) Same first post. When you stopped and restarted boinc, if you hadn\'t suspended the model before exiting boinc, the model would start up again automatically. How exactly did you \'kill its process by hand\'?


I shut down the BOINC manager, then stopped the BOINC service. From countless other service installs, I know that this normally causes all the client processes to exit as well. The CPDN process did not. I had to use \'End Process\' from task manager to kill the onery sucker.

3) I see that the model hasn\'t shown any trickles for 12 hours, but I\'m not sure what the delay is for them to be displayed. Is this model running at the moment? If it\'s running, could you look at its graphics frequently and jot down the model dates on paper. I\'m wondering whether this model is a looper. There\'s an item about loopers in the project READMEs linked to in my sig. Loopers get stuck - sometimes apparently a flaw in the model, sometimes a calculation glitch on the computer - and repeat a day, then a month, then a model year. If they still can\'t get through the sticking point they\'re supposed to abort themselves. I don\'t think your model has had enough time to get through this whole process yet.

If it does turn out to be a looper, the only method we know of that sometimes rescues them and gets them through the sticking point is to transfer them from an AMD machine to Intel or vice-versa. Only worth-while if you\'re certain it\'s looping, and you\'d have to decide whether you want to spend the time on a model that\'s only crunched 4 years.


The lack of trickles was due to the fact that it was \'stuck\' again when I came to the office this morning. Upon restarting, it dedicated 1hr to E@H then resumed (apparently correctly) the CPDN model. It has since trickled again. I read the bit on loopers, but without tweaking the service so it can interact with the desktop, I can only see the % done, not dates. I can only verify for sure that when the CPU use shows the model is running, the % done advances and has not apparently decreased at any point.

4) When a few members have had a problem with models not restarting after benchmarks, I don\'t think that boinc manager showed these models as running. The problem had to be solved by exiting and restarting boinc, after which these models ran normally until the next benchmarks.


The first time it stopped, it may have been when the machine came out of a benchmark (last benchmark was ~ 2 days ago), but I know with 100% certainty that the second time, it either stopped \'mid-stream\' while running, or it stopped at the beginning of a suspend/resume time slice. Unfortunately, since there\'s no problem indicated in the message log, I\'d pretty much have to stare at the BOINC manager for a few hours solid to be positive which is the case.
19) Questions and Answers : Windows : Model stops, cpu goes idle (Message 29887)
Posted 7 Aug 2007 by Thunder
Post:
I appreciate the suggestion, but it\'s definitely not benchmarking at the time this happens. After a second time of shutting down BOINC, killing the CPDN process and restarting, I\'ve now watched it process for a few hours, suspend once and successfully restart. I guess I\'ll see if it keeps behaving correctly... :)

Another thing to watch out for is BOINC Manager deciding to run its periodic benchmark test. This delays the model starting for a few minutes at most, but if I\'m in an impatient mood, I\'ve sometimes mistaken that for a freeze of some sort. It\'s easy to see if that really is the cause, because it\'s listed in the Messages tab of BOINC Manager - and it will happen only once every week or so.

20) Questions and Answers : Windows : Model stops, cpu goes idle (Message 29882)
Posted 7 Aug 2007 by Thunder
Post:
This model:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6718407

Has twice now stopped (it may be that when it resumes, it doesn\'t actually resume) and BOINC says it\'s running, but CPU time and % done never change.

The first time it happened, I figured it was a fluke and stopped/restarted BOINC. The model process did not stop and I had to kill it\'s process by hand.

Anyone know what may be causing this or if this is just a bad model?


Next 20

©2024 climateprediction.net