climateprediction.net home page
Posts by Conan

Posts by Conan

1) Message boards : Number crunching : New work discussion - 2 (Message 69557)
Posted 2 Sep 2023 by Profile Conan
Post:
Until we get more experience with volunteers running these high memory apps I think it makes sense to restrict it to a single task for now. We can change it later in light of experience.

No other projects I know of run tasks with this high memory requirements so it's not obvious how they will be received. Let's walk first before we run with this.
LHC's ATLAS tasks at 10GB are the biggest I know of. But that's 8 threads, so you don't get people trying to run huge numbers of them. Are yours going to be single threads?


YOYO@home ECM/P2 tasks take at least 11 GB per task, single thread. Which is why I stopped running them on my 32 GB machine and limit them to just 3 at a time on my 64GB machine, they are real memory hogs.

Conan
2) Message boards : Number crunching : New work discussion - 2 (Message 69537)
Posted 28 Aug 2023 by Profile Conan
Post:
Any new work for 64 bit coming along? I noticed a couple of new entries on the server status page

OpenIFS 43r3
OpenIFS 43r3 Baroclinic Lifecycle
OpenIFS 43r3 Perturbed Surface
OpenIFS 43r3 Cubic Octahedral grid tco95 l91
OpenIFS 43r3 Linear grid tl255 l91


Thanks
Conan
3) Message boards : Number crunching : New work discussion - 2 (Message 68914)
Posted 18 Jun 2023 by Profile Conan
Post:
Although not related to new work but following on from the last couple of posts,
CMDock uses a wrapper and it shows under Linux,
I believe that YAFU also uses a wrapper and possibly YOYO, SRBase, TNGrid? and a few others. In some cases it is needed due to the type of programme being used or the code it has been written in.

A few other projects also use a "Trickle up" method to keep the Server updated with progress (Primegrid is one) and some of these projects need a wrapper for this purpose.

Conan
4) Message boards : Number crunching : Server Status page questions (Message 68604)
Posted 19 Mar 2023 by Profile Conan
Post:
I have also wondered about the server page.

UK Met Office Coupled Model Full Resolution Ocean has had 927 tasks "in progress" for many months but I have seen no indication that any have been returned and the number never changes.

Weather At Home 2 (wah2) (region independent) has 4,731 tasks in progress again for many months and again I have not seen any activity with this either (maybe 1 came back 4 months ago but can't be sure).

What is happening with these work units?

Conan
5) Message boards : Number crunching : Upload server is out of disk space (Message 67724)
Posted 14 Jan 2023 by Profile Conan
Post:
Hi Kali,

The server they go to is in Hobart, NZ. I should have spotted the NZ in the task name and thought of that. Most likely when Andy gets my message he will email the data centre in Tasmania. This has happened before on a number of occasions.

Dave


Actually Dave, Hobart is in Tasmania, Australia. Not NZ (New Zealand).

Conan
6) Message boards : Number crunching : The uploads are stuck (Message 67538)
Posted 11 Jan 2023 by Profile Conan
Post:
Yes I am still seeing "connect(): failed" messages on all upload tries.

But I still have 4 work units running and I am no where near filling up any disks, so no problem here.

Conan


It has changed to "transient HTTP error" now so still not working here yet (Australia).

Server Status has not changed yet, still showing nothing.

Conan

PS: Some files are now moving, so possibly due to the load, some fail then must retry later, others are going through, some as low as 17 kB/s to as high as 1,700 kB/s.
7) Message boards : Number crunching : The uploads are stuck (Message 67525)
Posted 10 Jan 2023 by Profile Conan
Post:
Yes I am still seeing "connect(): failed" messages on all upload tries.

But I still have 4 work units running and I am no where near filling up any disks, so no problem here.

Conan
8) Message boards : Number crunching : Tasks failing on Ubuntu 22 (Message 67347)
Posted 5 Jan 2023 by Profile Conan
Post:
If you changed the option to "leave tasks in memory" but did not read the file to update BOINC with the change it may not work until it is read.
Restarting BOINC would also read the file.

Conan
9) Message boards : Number crunching : Hardware for new models. (Message 67296)
Posted 4 Jan 2023 by Profile Conan
Post:
I saw some test results with the AMD RYZEN 5950X, RYZEN 7950X, INTEL 12900 and INTEL 13900 (I think they were the model names).

When all under full load for what ever test they were doing

RYZEN 9 5950X used 130 Watts
RYZEN 9 7950X used 270 Watts (or there abouts)
INTEL 12900 used 285-290 Watts (or there abouts)
INTEL 13900 used 315 Watts (or there abouts)

Can't point you to the tests but they were on Youtube along with other showing similar results.

So the RYZEN 5950X may not be as powerful as the new models but for energy efficiency hard to beat.

That's of course if you can find them, they are getting harder to find.

I run a RYZEN 9 5900X which has 12 cores + 12 threads which should use even less power as it has less cores than the 5950X.
It has 64 GB of RAM and along with a full compliment of other BOINC projects easily runs 9 CPDN work units at a time. Only gets to about 42 GB max depending what I am running at the time (everything not just CPDN) (it may get higher than 42 GB but I have the head room to cover that.)

BOINC has not downloaded more than 9 work units at any one time, probably because I am running a lot of other projects at the same time.

Conan
10) Message boards : Number crunching : OpenIFS Discussion (Message 66999)
Posted 22 Dec 2022 by Profile Conan
Post:
All 9 work units that I had running overnight have completed successfully.

Running on an AMD Ryzen 9 5900x, 64GB RAM, all 24 threads used to run BOINC programmes at the same time as the ClimatePrediction models.
All took around 17 hours 10 minutes run time.

Conan
11) Message boards : Number crunching : Late Validation pending (Message 66991)
Posted 21 Dec 2022 by Profile Conan
Post:
Well it seems that these files have finally been validated and I have been awarded credit for them, I think.

I have noticed a clean up/out has taken place and a lot of the old past work units that I have done over the years has been removed.
Those 2 pending jobs among them. I was awarded some small amount of credit this week when I have not done any work and now it seems that the database has had a bit of a clean out and fix up. Good to see.

Conan
12) Message boards : Number crunching : OpenIFS Discussion (Message 66990)
Posted 21 Dec 2022 by Profile Conan
Post:
G'Day Glenn,

You may of miss read what I wrote I think.

The 11.3 GB was not a file size but the amount of disk writes made in that first 2 hours (now after 5 hours well over 30 Gb).
The 2.7 to 4.6 GB were RAM amounts that each work unit was using.

This was all taken from System Monitor.

I did what you have asked and

% cd slots/26
% du -hs . # note the '.'
1.2G .

This is the same as your example.

% cd projects/climateprediction.net
% du -hs .
1.2G .

This is similar to your example.

du -hs srf*

768 MB srf00370000.0001

So all running fine, so maybe just a bit of a misunderstanding I think with data amounts and RAM usage.

Thanks
Conan
13) Message boards : Number crunching : OpenIFS Discussion (Message 66983)
Posted 21 Dec 2022 by Profile Conan
Post:
These Oifs _ps tasks really test your system out.

Running 9 at once, each using from 2.7 to 4.2 GB of RAM, after 2 hours run time they have written 11.3 GB of data to disk each (101.7 GB), which is huge.
Hitting 50 GB of RAM in use out of 64 GB, but I am also running LODA tasks which each use 1 GB of RAM. All 24 threads are running.
12% in and running fine so far.

Conan
14) Message boards : Number crunching : OpenIFS Discussion (Message 66795)
Posted 6 Dec 2022 by Profile Conan
Post:
My resent task 22249228 has been sent out twice before.

Previous Task 22246540 and Task 22248943

Task 22246540 has no Stderr, it failed with a Run Time of 1 Day 5 Hours and a CPU Time of 31 Minutes. It also had an unusual amount of Peak Disk Usage of 23,961.87 MB (or 23.9 GB) way above the norm as I have seen.

Task 22248943 has the error "Process exited with code 9" other than that seemed to have run fine. This one belonged to wateroakley

I was able to run this WU to completion without error.


Another resent task I have running is Task 22249324

Previous Task 22247025 and Task 22249194

Task 22247025 on computer 1524992 it had a Run Time of 42 Minutes with a CPU Time of 20 Seconds with a Peak Disk Usage of just 404.06 MB.
This computer still has work on it but has not completed a successful OpenIFS WU all failed work units have the same long run times and short CPU times and have different error codes as well, codes 1, 5 and 148 all appear on this computer.

Task 22249194 on computer 1504810 has No Stderr, has a Run Time of 1 Day 1 Hour and CPU Time of 7 Hours.
This computer has run 9 OpenIFS work units all have failed with the long Run Time and short CPU Time.
This computer belongs to happywetter.at

So a few different reasons that some work units have failed or thrown an error.

Conan

I completed Task 22249324 successfully in just under 17 1/2 hours.
15) Message boards : Number crunching : OpenIFS Discussion (Message 66793)
Posted 5 Dec 2022 by Profile Conan
Post:
My resent task 22249228 has been sent out twice before.

Previous Task 22246540 and Task 22248943

Task 22246540 has no Stderr, it failed with a Run Time of 1 Day 5 Hours and a CPU Time of 31 Minutes. It also had an unusual amount of Peak Disk Usage of 23,961.87 MB (or 23.9 GB) way above the norm as I have seen.

Task 22248943 has the error "Process exited with code 9" other than that seemed to have run fine. This one belonged to wateroakley

I was able to run this WU to completion without error.


Another resent task I have running is Task 22249324

Previous Task 22247025 and Task 22249194

Task 22247025 on computer 1524992 it had a Run Time of 42 Minutes with a CPU Time of 20 Seconds with a Peak Disk Usage of just 404.06 MB.
This computer still has work on it but has not completed a successful OpenIFS WU all failed work units have the same long run times and short CPU times and have different error codes as well, codes 1, 5 and 148 all appear on this computer.

Task 22249194 on computer 1504810 has No Stderr, has a Run Time of 1 Day 1 Hour and CPU Time of 7 Hours.
This computer has run 9 OpenIFS work units all have failed with the long Run Time and short CPU Time.
This computer belongs to happywetter.at

So a few different reasons that some work units have failed or thrown an error.

Conan
16) Message boards : Number crunching : OpenIFS Discussion (Message 66737)
Posted 3 Dec 2022 by Profile Conan
Post:
Just downloaded a resend of a Work Unit that failed due to an error.

This Task 22245903

It failed due to running longer than 5 minutes after the work unit had finished.

The WU was run by mikey and other than the longer run time after finishing seemed to have run successfully after over 2 days run time.

The run time seems overly long on a Ryzen but did complete.

It is now running as Task 22249047 on my Ryzen computer.

Will see how it runs for me.

Conan


Completed successfully after 16 1/2 hours.

Conan
17) Message boards : Number crunching : OpenIFS Discussion (Message 66718)
Posted 2 Dec 2022 by Profile Conan
Post:
Just downloaded a resend of a Work Unit that failed due to an error.

This Task 22245903

It failed due to running longer than 5 minutes after the work unit had finished.

The WU was run by mikey and other than the longer run time after finishing seemed to have run successfully after over 2 days run time.

The run time seems overly long on a Ryzen but did complete.

It is now running as Task 22249047 on my Ryzen computer.

Will see how it runs for me.

Conan
18) Message boards : Number crunching : OpenIFS Discussion (Message 66684)
Posted 1 Dec 2022 by Profile Conan
Post:
Experiment successful, work unit completed without error in a shade under 18 hours.

The time may of been due to how loaded up the processor was during this time but still good.

Don't know about the cache hits as the experiment was done on an older Intel i5. My newer Ryzen I believe has a larder cache but without looking things up I don't know what it is either.

The 2 still on the Ryzen are paused at the moment due to some PrimeGrid work I need to do, they both still have 33% left to run.

Conan
19) Message boards : Number crunching : OpenIFS Discussion (Message 66681)
Posted 30 Nov 2022 by Profile Conan
Post:
As an experiment, I have downloaded a work unit to my 4 core 8 GB Linux computer to see how it would run.

The computer is running other BOINC projects and at the moment is running LODA and PRIVATE GFN SEARCH plus iThena.Measurements and WUProp@Home.
iThena.Measurements and WUProp are Non-CPU intensive. PRIVATE GFN SEARCH uses minimal resources and less than 50 kB of RAM to run, however LODA is different and uses 1 GB per work unit of RAM.

When started the Climate model maxed out my 8 GB and used half my SWAP (7.6 GB so about 3 to 4 GB) this is along with the other BOINC projects.

So the computer slowed to a crawl but kept running.

Once settled down the Climate model is now using from 2 to 4.5 GB and no SWAP even with 3 LODA work units running as well, but does start to lag a lot. With only 2 LODA, 1 PRIVATE GFN SEARCH and 1 Climate Open IFS running it is quite usable.

The Open IFS Climate model is now at 76.425% after 13 hours with about 4 1/2 hours or so to go.

So it can be done on 8 GB memory but I would not recommend it if you also want to use the computer as well, because you can go to sleep waiting for the screens to change.

As an aside to this I have been having no trouble with all the trickles from 5 work units (now 3 as 2 finished) they go as soon as they are ready.
Using a hybrid Fibre to the Node and copper cable to the house Broadband system with around 15 MB upload and 25+ MB download (both on good days with low usage by others on the ISP network).

I will stick to my RYZEN 5900x with 64 GB RAM, much less hassle even running 4 at a time does not use over 20 GB.

Conan
20) Message boards : Number crunching : Task completed, but not all trickles acknowledged yet. Normal? (Message 66561)
Posted 24 Nov 2022 by Profile Conan
Post:
In a similar vein, I have This WU 22236909 that reported all trickles and seems to have been awarded full credit but still says it is on my computer and still running.

It uploaded with the last trickle so does anyone know what has happened to it?

I do not have it on my computer.

(there are 3 failed work units on that same computer reported today but they stem from a power failure which upset them)

Thanks
Conan
21) Message boards : Number crunching : New work discussion - 2 (Message 66295)
Posted 7 Nov 2022 by Profile Conan
Post:
OpenIFS 43r3 Perturbed Surface, has been added to the application list, what does it mainly cover? Thanks
Conan
it's a modified version of the default OpenIFS in which the surface parameters, instead of the atmospheric ones, can be modified. There are two large (~3000) ensembles planned for this month where each member of the ensemble has slightly different parameters. I'll ask the scientists to write something to the forum about it.


Thanks for that Glenn, appreciated.

Conan
22) Message boards : Number crunching : New work discussion - 2 (Message 66293)
Posted 7 Nov 2022 by Profile Conan
Post:
OpenIFS 43r3 Perturbed Surface,

has been added to the application list, what does it mainly cover?

Thanks
Conan
23) Message boards : Number crunching : New work discussion - 2 (Message 66234)
Posted 24 Oct 2022 by Profile Conan
Post:
Thanks, that is what I thought, possibly the reason I could not get access before as well.

Thanks
Conan
24) Message boards : Number crunching : New work discussion - 2 (Message 66230)
Posted 24 Oct 2022 by Profile Conan
Post:
I would like to join the cpdnboinc-dev project to help out but it appears to need an invitation code and it's not mentioned anywhere that I could see.

I think I tried a long while back but could not get a leg in, but memory a bit fuzzy about that.

My Linux computer was upgraded a while back to 64 GB RAM (for other BOINC projects requiring 1 or more GB of RAM per work unit, with 32 GB and 24 threads I was maxing out my memory).

An invitation would be nice but perhaps they might have enough testers? So I might still not get in.

Thanks
Conan
25) Message boards : Number crunching : New work discussion - 2 (Message 66194)
Posted 15 Oct 2022 by Profile Conan
Post:
Thanks Glenn for the update.
Hope you get well soon, plenty of fluids and rest.

Conan
26) Message boards : Number crunching : New work discussion - 2 (Message 66189)
Posted 15 Oct 2022 by Profile Conan
Post:
Any news on how Glenn and Andy are going with the OpenIFS application?

(Hoping they are both clear of Covid now of course.)

I would like to give it a go.

Thanks
Conan
27) Questions and Answers : Unix/Linux : Fedora 36 (Message 66063)
Posted 6 Sep 2022 by Profile Conan
Post:
I am unsure about downloading lib32 bit libraries for Fedora 36 but having Fedora 25 then updating to Fedora 36 when I got a new computer (I used the old hard disk rather than the NVMe drives in the new machine as everything was already on it), has not been a problem and ClimatePrediction works fine.
The 32bit libraries were installed on the older Fedora ages ago and are still accepted with the later spins.

Conan
28) Message boards : Number crunching : Hardware requirements for upcoming models (Message 65912)
Posted 20 Aug 2022 by Profile Conan
Post:
Over at Primegrid you can set how ever many cores you want to use to multithread, from none to as many as you think you need or can spare. A Seventeen or Bust (SoB) work unit takes 400 to whatever hours on a single core but this drops to less than 100 hours using 6 cores.

I have a ryzen 3900X with 12 cores and 12 threads but limit Primegrid to just 6 cores for multithreading due to the large number of other projects that I run at the same time.

The n-body work units from Milkyway use 16 cores and I can't seem to change that but as they only run for a few minutes to an hour or so it is not a problem, still have 8 other cores.

At Yafu@home you can set from 1 to 32 cores due to their work unit types.

Conan
29) Message boards : Number crunching : New work Discussion (Message 65781)
Posted 7 Aug 2022 by Profile Conan
Post:
Just for information. Hoping these do come to main site soon. One of my Open IFS tasks from testing had the following on the task page.

Peak working set size 8.77 GB
Peak swap size 9.38 GB

Oh and some of them have had final uploads in the region of 1GB too!


The application has been placed on the Application Page, just awaiting the actual work units to go with it.

So something is moving.

Conan
30) Message boards : Number crunching : New work Discussion (Message 65740)
Posted 3 Aug 2022 by Profile Conan
Post:
Do you know if they run faster as 64 bit or are they the same?
If the same then what is the benefit?

Is there any reason (that you know of) why they need so much memory?
More expansive models, more parameters or something else?

Still keen to try some OpenIFS work units.

I have 64 GB of RAM on my AMD 5900X (12 cores/24 threads), as only 4 work units seem to be downloaded at any particular time (in the last two attempts to get work) I should be OK, (I run a lot of other projects as well, so this limits how much work can be downloaded).

Conan
31) Message boards : Number crunching : New work Discussion (Message 65737)
Posted 2 Aug 2022 by Profile Conan
Post:
Just spied this change on the server site

OpenIFS 43r3
OpenIFS 43r3 ARM

Perhaps something in the wings?

I have not done any OpenIFS work units before so would like to try some.

OpenIFS did not have the 43r3 after its name yesterday.

Conan
32) Message boards : Number crunching : New work Discussion (Message 65726)
Posted 1 Aug 2022 by Profile Conan
Post:
All batches are being processed, are there any new work developments on the horizon?

Thanks
Conan
33) Message boards : Number crunching : HadSM4 Error when completed and Uploading (Message 65725)
Posted 1 Aug 2022 by Profile Conan
Post:
Just had a work unit complete without error, so that missing libnsl file was the problem.

Conan
34) Message boards : Number crunching : New work Discussion (Message 65724)
Posted 1 Aug 2022 by Profile Conan
Post:
G'Day AndreyOR,

Don't know about unusual version as I got from the BOINC Web site, it is classed as a "Pre-Release" Linux version.

The previous Linux official version on the BOINC wed site was 7.4.25 which is running on my other Linux computer and may need an update one day maybe (it prioritizes work a bit better than the new one I reckon).

Conan
35) Message boards : Number crunching : New work Discussion (Message 65714)
Posted 31 Jul 2022 by Profile Conan
Post:
[quote]7.16.6 probably came from a package installation. When I installed BOINC from a package on Ubuntu 20.04 that's the version I got. On 22.04 the version is 7.18.1. For Linux machines, according to the site, it's recommended to install BOINC using a package manager.

I know this is getting off topic,
I used a package manager version only the once. Found it stuck pieces of BOINC every where, had trouble finding and maintaining it that way, plus all the permission problems (why do they insist on a separate user to control the BOINC install?), also installed as a service which I then had to learn how to control from the command line.

So any other Linux install I have done or updates have all been from the BOINC website and it all works fine, all in one place (still a small permission problem but that is easy to fix).

So if it recommended for Linux users to use a package manager I don't agree with it.

Conan
36) Message boards : Number crunching : Missing Trickle (Message 65704)
Posted 29 Jul 2022 by Profile Conan
Post:
Thanks Les, all my trickles have come through now, including a couple I didn't realize I had sent.

Conan
37) Message boards : Number crunching : Missing Trickle (Message 65701)
Posted 29 Jul 2022 by Profile Conan
Post:
I was wondering if there is a problem with the Trickle Server.

I uploaded a trickle at 4.43am Australian time and it is now almost 10 Hours later and it still has not shown up on my account.

The Zip file uploaded without an issue.

I will have some more due soon and don't want to lose them as well.

So just wondering where it is.

Conan
38) Message boards : Number crunching : Computation Errors (Message 65680)
Posted 25 Jul 2022 by Profile Conan
Post:
I expect the event log would make interesting reading.


It sure would. Too bad the user never looks at it. Is he not even curious that he has obtained no credit for years of work?

Maybe he died and no one has found out, or even turned off his machine.


Last contact was today 25th July 2022, so not turned off.

He has another computer ID 1517679 with same problem and over 6,100 failures.

But he has this computer ID 1517434 that is connecting, downloading and processing work, it does get errors but he does get credits as well as it sends back trickles.

Conan
39) Message boards : Number crunching : Computation Errors (Message 65675)
Posted 23 Jul 2022 by Profile Conan
Post:
I just downloaded 4 SM4 work units to see if my missing library problem (libnsl.so.1) has been fixed and I can get some successful work done.

I found that they were all re-sends from others who failed with a different missing library libstdc++.so.6

Computer 1460610 - Bartosz Toczek only started having this issue with SM4, no probs with AM4.

Computer 1531595 - Anonymous has 79 failures all SM4

Computer 1532546 - Science United has 30 failure all SM4

And of course I also got Computer 1517479 - Eric Korpela over 11,000 failures and I thibk they could be permission problems? but I am not sure he isn't showing that library as missing.

Conan
40) Message boards : Number crunching : HadSM4 Error when completed and Uploading (Message 65658)
Posted 18 Jul 2022 by Profile Conan
Post:
@Jean-David

The install of his Fedora 36 did not include the libnsl file that is apparently needed. This results in upload failures (for some reason). He has installed this file now and the 6th and final file did upload correctly. However, since the other five monthly zip files did not go up (before he installed libnsl), boinc marked the results as errors.


@Conan

Looking at the stderr on your task webpages, the 6th zip must have been uploaded successfully, but the other 5 monthly zips weren't. So boinc marked the result as an error. Now that libnsl is installed, you shouldn't have any more errors of this type.


Thanks geophi,

I decided to check back on a few WUs I ran back in May 2021 and found they had failed for the same reason.
I had Fedora 31 at the time, but apparently I did not check as to why the work units were marked invalid.
If I had of checked I could of fixed this issue last year and had 4 successful results now instead of 4 failures.

I will have to check why things fail a bit better it seems. You live and learn.

Conan
41) Message boards : Number crunching : HadSM4 Error when completed and Uploading (Message 65652)
Posted 18 Jul 2022 by Profile Conan
Post:
Yes even though I updated my files all 3 (of 4) SM4 work units have now finished in an error.

So even though all looks good and trickles are reported and I got some credit, something is missing and the it errors out.

Well 1 to go in less than 5 hours and I will be done with a waste of 12 days (each WU ran 4 days) of crunching with no valid results.

I may take another break from the project again unless a few types I have not run yet get work and I will try them.

Thanks all for your help

Conan
42) Message boards : Number crunching : HadSM4 Error when completed and Uploading (Message 65644)
Posted 17 Jul 2022 by Profile Conan
Post:
Just finished a HadSM4 model after 4 and bit days.
As it finished uploading I received a Computation Error, which I thought was strange as it and 3 others had been running fine.

On checking it seems that I was missing a file that I didn't know about

libnsl.so.1

Without this file you get

Unable to load library hadsm4_se_8.02_i686-pc-linux-gnu.so
dlopen error: libnsl.so.1: cannot open shared object file: No such file or directory


I have now updated this file on my Linux install and hope the other 3 work units will now be OK and I won't have 3 more failures (it would be a waste of over 12 days computations equivalent).
They have between 5 and 13 hours to go.

Conan

(PS --Oh just noticed it is my 100th post since I joined in 2006.)
43) Message boards : Number crunching : Late Validation pending (Message 65642)
Posted 17 Jul 2022 by Profile Conan
Post:
I have just realized that I have 2 Work Units on my computers that seem to be waiting for validation.

Work Unit 9811766 was completed on 2nd May 2015.
It is a HadAM3P-HadRM3P Australia New Zealand type.

Work Unit 9097228 was completed on 20th September 2014.
It is of the same type.

If I look in the Task details for both work units it shows them having been credited but not on the main page as they are awaiting "Validation".

Who are they waiting on this long for the work units to be Validated?

Very Curious
Conan
44) Message boards : Number crunching : WAH2 CREDITS SET TO LOW (Message 52616)
Posted 23 Sep 2015 by Profile Conan
Post:
Yes, it's the hadam3pm2 I was concerned about. My thought is to let it go on for a while to see if its "slowness" persists. It's possible I'm wrong, or getting excited prematurely; in any case it should be clearer after a few more days.

Actually, I'm really curious to know if anyone else running the model has encountered the same issues--or not.


G'Day jrapdx,

I have noticed the same on my AMD 1090 Phenom, first estimate was for 96 Hours to completion. After running for 8 and 3/4 hours it is now up to 113 Hours to run with just 0.257% completed. That's an estimate of 3,284 Hours run time.
That is getting back to the old days when Climate ran for months to get a WU to complete.
I doubt that it will take this long and that the original 96 Hours is closer to the mark, however going on a previous failed WU it will take over 250 hours.

Conan
45) Message boards : Number crunching : Linux/Mac/Windows segmentation (Message 51881)
Posted 23 Apr 2015 by Profile Conan
Post:
I have a couple of Fedora 16 64 bit systems. To start my BOINC sessions I would click on the "Files" icon to launch the files system, navigate to the BOINC folder and then start BOINC.
I would leave the "Files" open so I could easily go back to check things in BOINC or the Download folder or other files searches.

I found after doing this for a long time without a computer reboot (such as long running projects that restart from scratch if you start BOINC again), that my systems start to slow down and I start having memory issues.

The "Files" programme starts to use a lot of resources over time.

By closing it down when I am not using it (after I have done my file searches, started BOINC or installed some download), I found that the whole system came back to life and I got my memory back (the computers memory, not mine as that is already lost).

So an innocent programme could be causing you an issue.

Conan
46) Message boards : Number crunching : credit anomaly MOSES_eu vs MOSES_global 25% for eu? (Message 51793)
Posted 9 Apr 2015 by Profile Conan
Post:
Thanks to all who helped in getting this done.

Conan
47) Message boards : Number crunching : credit anomaly MOSES_eu vs MOSES_global 25% for eu? (Message 51765)
Posted 5 Apr 2015 by Profile Conan
Post:
Les,

Yes, thanks, I know this, I was responding to a couple of lower down replies as to me the issue did not seem clear.

I am still processing them even with the much lower credits, as I am hoping the science or results, may come to something useful.

Thanks to the moderators for their response in this matter.

Conan
48) Message boards : Number crunching : what is this stupid message (Message 51764)
Posted 5 Apr 2015 by Profile Conan
Post:
I'm set to get new work, but communication keeps getting deferred for one hour. In the meantime, Einstein and Seti continue working just fine.


The only work for Linux at the moment (as of today 5th April) is

hadam3prm3pm2t_eu (hadam3p global model with hadrm3p regional model with MOSES II land scheme and TRIFFID available) (currently no graphics) (Linux only)
Tasks ready to send 13,462

If you have your setting set to receive all model types you should pick some of these ones up.

Conan
49) Message boards : Number crunching : credit anomaly MOSES_eu vs MOSES_global 25% for eu? (Message 51758)
Posted 4 Apr 2015 by Profile Conan
Post:
As I noted in an earlier message

[Quote]I noticed that the credit awarded is exactly the same as the other (non-MOSES) Europe work unit type, so perhaps they have just got it mixed up.

17791628 9545343 17 Jan 2015 5:09:11 UTC 6 Feb 2015 18:15:18 UTC Completed 1,004,026.51 888,650.10 2,389.90 2,389.90 UK Met Office HadAM3P and HadRM3P model with MOSES II and TRIFFID Europe v7.01

17635813 9423239 8 Jan 2015 11:02:12 UTC 14 Jan 2015 4:05:09 UTC Completed 313,293.72 216.84 2,389.90 2,389.90 UK Met Office HadAM3P-HadRM3P Europe v7.23
[Quote]

They seem to have used the same crediting system for each Europe model even though they are completely different models.

Conan
50) Message boards : Number crunching : credit anomaly MOSES_eu vs MOSES_global 25% for eu? (Message 51750)
Posted 3 Apr 2015 by Profile Conan
Post:
This model is still not getting the correct credits, less than a 3rd of what they should be getting.

Conan
51) Message boards : Number crunching : How badly did I get screwed? (Message 51718)
Posted 28 Mar 2015 by Profile Conan
Post:
I don't have any A10 or similar CPUs but I have the older Phenom II type CPUs and to give you an idea of their run times on a couple of different models here they are (they run at 3.2 GHz standard clock, running Linux Fedora 16 64 bit)

This is quoted RUN Time, CPU Time is often quite a bit less,

MOSES II Landsurface Scheme around 1,008,800 seconds (280 Hours)

MOSES II and TRIFFID Europe around 1,004,000 seconds (279 Hours)

Coupled Model Full Resolution Ocean from 1,230,000 to 1,550,000 seconds (341 to 430 Hours)

I currently have 2 MOSES II and TRIFFID models running and they are

192 Hours at 67% with 92 Hours to go
242 Hours at 83% with 47 Hours to go

This at least will give you an idea as to what to expect, I am hoping that as you have much more recent chips then the times will be a lot lower, I am hoping this as I will have to update soon myself.

Conan
52) Questions and Answers : Windows : CPDN gone from Project tab after crash (Message 51646)
Posted 17 Mar 2015 by Profile Conan
Post:
Rather than re-attach to the project, which might cause multiple host instances, just try stopping BOINC, closing it then restarting it. You should not need to reboot but as it may be a driver issue then perhaps a reboot will sort everything out.

Conan
53) Message boards : Number crunching : credit anomaly MOSES_eu vs MOSES_global 25% for eu? (Message 51621)
Posted 13 Mar 2015 by Profile Conan
Post:
Does anyone know if this credit error was corrected?

I am not keen to run this WU I have just downloaded for 1,000,000 seconds only to find I am getting a 5th of the credit I should be getting for this WU type.

Conan
54) Questions and Answers : Unix/Linux : Multiple CP task management (Message 51410)
Posted 13 Feb 2015 by Profile Conan
Post:
Are you sure that your preferences are set to use 100% of CPU and to use all 2 cores available?

If it is not on 100% then BOINC sees the 1000 hour work unit as more important and throws most resources at that problem.

Conan
55) Questions and Answers : Windows : why does my anti-virus program think "hadam3p_afr_7.22_windows_intelx86.exe" is malware? (Message 51409)
Posted 13 Feb 2015 by Profile Conan
Post:
Hi,
I found for TrendMicro, that the web-reputation service is causing the problem.
The web-reputation service doesn't allow a download from the download share you are using for the boinc program files like "hadam3p_afr_7.22_windows_intelx86.exe".
The project program files are created, but only contain the info text from web-reputation service.

Will try to post the web-reputation message from file tonight.


I was having TrendMicro block a number of Climate models. I was able to check under threats what was blocked and then allowed for this to be excluded from now on, also sent of information to TrendMicro that these are false positives.

All now working OK.

Conan
56) Message boards : Number crunching : credit anomaly MOSES_eu vs MOSES_global 25% for eu? (Message 51374)
Posted 7 Feb 2015 by Profile Conan
Post:
I noticed that the credit awarded is exactly the same as the other (non-MOSES) Europe work unit type, so perhaps they have just got it mixed up.

17791628 9545343 17 Jan 2015 5:09:11 UTC 6 Feb 2015 18:15:18 UTC Completed 1,004,026.51 888,650.10 2,389.90 2,389.90 UK Met Office HadAM3P and HadRM3P model with MOSES II and TRIFFID Europe v7.01

17635813 9423239 8 Jan 2015 11:02:12 UTC 14 Jan 2015 4:05:09 UTC Completed 313,293.72 216.84 2,389.90 2,389.90 UK Met Office HadAM3P-HadRM3P Europe v7.23

Conan
57) Message boards : Number crunching : credit anomaly MOSES_eu vs MOSES_global 25% for eu? (Message 51371)
Posted 6 Feb 2015 by Profile Conan
Post:
I have now reported one of these work unit types, and am seeing the same issue.

The MOSES (Global Only) type I reported last year gave 10,064.32 credits, the Europe model gives just 2,191.17 credits for a similar run time.

I wont be doing any more of these as the return is not really worth it (7 cr/hr).

Conan
58) Message boards : Number crunching : Is "Invalid Theta Detected" always due to bad work units? (Message 51358)
Posted 4 Feb 2015 by Profile Conan
Post:
Well they are still happening as I just had about 6 fail over the last day with this error, they run for about 21 minutes then fail.
All are new work units not ones from last October.

This is on a Windows XP 32 Bit machine.

Conan
59) Message boards : Number crunching : HadCM3 short errors (Message 51327)
Posted 27 Jan 2015 by Profile Conan
Post:
Your failures all are for work units that were made back in Sept '14 and a few from Oct '14, no one has been able to run these work units they are faulty. The successful ones you have run come from Jan '15.

We seem to have to wait till they all cycle through the system to get rid of them.

I have been getting similar errors.

Conan
60) Message boards : climateprediction.net Science : Climate change in the News (Message 51326)
Posted 27 Jan 2015 by Profile Conan
Post:
Australia temperatures rising faster than rest of the world: official report

Australia faced a rise in temperature of potentially more than 5 degrees celsius (41 degrees fahrenheit) by the end of the century, an increase that would outpace global warming worldwide, the country's national science agency said on Tuesday.

NB: A rise of 5 degrees C is 9 degrees F. It is not an absolute temperature. Some journalist needs to go back to school.


Scary stuff if proven true. Things do seem to be a bit warmer than when I was younger, but then here on the east coast of Australia that is subjective, we had 35 degrees C on Sunday (2 days ago) and yesterday (Monday) we had 19 degrees C. This is in the middle of an Aussie summer, very weird.

The journalist was sort of close, as going from a starting point of Zero degrees C you get 32 degrees F and 9 degrees F plus the 32 gives 41 degrees F. That person has just twisted around how the 9 degrees F was applied, probably to sensationalise the article, truth has never gotten in the way of a good story, it is usually just pushed to the side.

Conan
61) Questions and Answers : Windows : Multiple download errors (Message 51325)
Posted 27 Jan 2015 by Profile Conan
Post:
I downloaded the links and sent a link to TrendMicro to re-evaluate the download site.
I was then able to get a Short work unit to download and start running.
Unfortunately it was one of the ones from October and failed just like it has for all who have processed this WU.

Trying for another download to see if I can get one to run full term.

Conan
62) Questions and Answers : Windows : Multiple download errors (Message 51318)
Posted 27 Jan 2015 by Profile Conan
Post:
Thanks Thyme Lawn, those links you have posted are they safe?
My antivirus (TrendMicro) claims that they are Dangerous Web Pages and involved in scams and malicious activity, advising me to not to go to that web page.

Conan
63) Questions and Answers : Windows : Multiple download errors (Message 51313)
Posted 26 Jan 2015 by Profile Conan
Post:
Thanks for the reply Dave.

Unfortunately doing a detach and reattach (which I had done a few days ago) did not make any difference, still can't download a Short model.

BOINC shows that I have over 23 GB available and I can't see any problems with my preferences that would cause a download issue.

It is strange, as all the support files that get downloaded with each work unit, download without an issue, but the main file dies.

Conan
64) Questions and Answers : Windows : Multiple download errors (Message 51311)
Posted 26 Jan 2015 by Profile Conan
Post:
Rather than start a new thread I am re-using this one as it is related to what I am getting.
I am getting the following error on nearly all downloads to a Windows 32 bit XP machine, machine Host number = 1352573.


app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>hadam3p_afr_7.22_windows_intelx86.exe</file_name>
<error_code>-200</error_code>
</file_xfer_error>

It is not just on the African models but also Pacific North West and Short models.

I did see a message saying I did not have any memory available? I had over 13 GB but this seems to be not enough.
I have detached a number of other projects from this computer, deleted a number of other files and this has now given me around 16 GB of disk space for BOINC to play with.
I have placed Climateprediction.net in my anti-virus (TrendMicro) exclusion area plus each executable.
Of all the executable files that have tried to download, not counting the graphic files, 3 have consistently failed to download, always just 23.6 kB in size and therefore generating checksum errors due to the wrong size.
It is only this one file that wont download as all the other files download even if the main one fails.

I have the following files that appear to be correct size

hadam3p_anz_6.10_windows_intelx86.exe
hadam3p_anz_um_6.10_windows_intelx86.exe
hadcm3n_6.07_windows_intelx86.exe
hadcm3n_um_6.07_windows_intelx86.exe
hadrm3p_anz_um_6.10_windows_intelx86.exe

The following files all have just 23.6 kB in size and fail every download

hadam3p_afr_7.22_windows_intelx86.exe
hadam3p_pnw_7.22_windows_intelx86.exe
hadcm3s_7.24_windows_intelx86.exe

I have managed to actually start running a few of the Full models and even got to the first trickle on one of them but they all then failed with errors.

My other Windows machine had similar errors for a few work units then downloaded and started running one without further problems, they are both AMD Phenom CPUs both with 4 GB of RAM, the 6 core is running OK the 4 core is the problem and it also only has an 80 GB hard drive.

Any help would be appreciated.

Thanks
Conan
65) Message boards : Number crunching : HadCM3 short - errors galore (Message 50665)
Posted 29 Oct 2014 by Profile Conan
Post:

Also every time I visit the climate site and visit my account I have to re-loggin despite checking the "keep logged in box"... its that a cookie issue?.


This also happens to me when I click the links from Free-DC site to get to Climate.
If I go to the Berkeley list of projects and click on the Climate link from there I don't have to re-log in, as it shows I already am.

I can't see the difference but always happens, no idea why.

Conan
66) Message boards : Number crunching : Extremely high work units done. (Message 50100)
Posted 10 Sep 2014 by Profile Conan
Post:
The stats have been exported. You should see some of the totals, they have gone past FreeDC credit counter limits.

Conan
67) Message boards : Number crunching : UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03 (Message 49108)
Posted 14 May 2014 by Profile Conan
Post:
I must be one of the lucky ones as I have managed to finish a MOSES without error (as far as I can tell).
See WU 8804573

Just waiting for the validation and credits to catch up.

Took just over 311 hours run time.

Conan
68) Message boards : Number crunching : HTTP gateway timeout (Message 48877)
Posted 23 Apr 2014 by Profile Conan
Post:
Couldn't find another thread relating to this issue so started another.

I noticed that my account showed I had a WU that was not on my computer.
So I investigated the BOINC Manager log for answers.

Shortly before midnight (Australian time) on the 23/4/14 my computer tried to download a new work unit.

This apparently failed with the message Scheduler Request Failed : HTTP gateway timeout

As I never got this WU it will stay on my list for a year before getting resent.

Why does the system think it downloaded files when nothing connected and just timed out?

Conan
69) Questions and Answers : Preferences : Cannot Update Computer Preferences (Message 48867)
Posted 23 Apr 2014 by Profile Conan
Post:
If I click on the link for each model type under my preferences I get an error that the Page can't be found, so you can't get any information on the different models to see what is required.

Conan
70) Message boards : Number crunching : Missing Credits from Host Total (Message 43629)
Posted 4 Jan 2012 by Profile Conan
Post:
Well I had to fiddle didn't I?
I updated the OS of the Host originally having the credit issues after it had dropped from 491,954 credits to 290,613, I went from 32 bit to 64 bit.
My total has now dropped again to just 184,022 credits.

So HOST 1186021 should have 597,150 Total Points (shows 214,415 points which is the displayed WU total in my account which has yet to be archived)

HOST 1188489 should have 491,954 Total Points (did show 290,613 NOW showing 184,022 points which is the viewable WU total not yet archived)

So for some reason the system is only giving a Host the value of the viewable Work Units accessible in your account, not the overall total of all work processed by the Host (which is Viewable Work Units PLUS Archived Work Units).

Hope is high that Climate Prediction Team will resolve this issue in the near future.

After all it is a new year and stress has taken a back seat over the holiday break.

Thanks
Conan
71) Message boards : Number crunching : Missing Credits from Host Total (Message 43591)
Posted 22 Dec 2011 by Profile Conan
Post:
I've certainly had credit totals for a host reduced when that host was merged, which anomalous behaviour was passed onto the project team (failure to carry across archived results?). However, my overall total didn't reduce.


G'Day Iain,
My host total stayed the same after I merged the host. It dropped after the first trickle was received.
My overall total appears OK.
However I would like my Hosts to be reporting the actual amount of work they have performed, at the moment it looks like they have done very little work for the total that I have, also to get the correct placing in host stat rankings, plus I like things to add up. At the moment very little adds up unless you do a lot of maths like I have in this thread.

If something was mucked up in the first place then that same something can be repaired.

If it has been reported then at least project staff know about it and I hope can get round to fixing it up.
At the moment I know they have lots of other more important issues, particularly with data storage problems, so I will continue to wait.

As I said in my previous post, I have waited a year now so I can wait a little longer.

Thanks again, and have a good Christmas.

Conan
72) Message boards : Number crunching : Missing Credits from Host Total (Message 43582)
Posted 21 Dec 2011 by Profile Conan
Post:
Thanks Les,
I am waiting, and have been since I first started this thread on the 1st of January 2011 (1/1/11).
So 12 months waiting so far.
I can wait a little longer.

Have a good Christmas and speak to you next year (same for all Climate Prediction people, both Admin and Volunteers)

Conan
73) Message boards : Number crunching : Missing Credits from Host Total (Message 43580)
Posted 21 Dec 2011 by Profile Conan
Post:
Well it looks like I spoke too soon.
Someone has looked at my problem.
Unfortunately they have made it worse.

The Host I was having the problem of totals with still has that problem, it has not been fixed.
That is Host 1124050, it still shows 290,613 but should have 491,954 credits as its total.

Now Host 852550 which I was having NO trouble with and it had the correct totals, I upgraded the operating system from 32 bit to 64 bit.
This gave me a new Host ID of 1186021.

And with my first Trickle in quite a while my total for this Host has DROPPED from 594,649 to 211,662 (includes a Trickle of 252 credits).

The displayed totals for my two hosts are the totals of my viewable work units for each host, not my overall totals for these hosts (which should include the archived amounts as well).

Can my two hosts please be corrected to the following amounts

HOST 1186021 should have a total of 594,649 cobblestones

HOST 1124050 should have a total of 491,954 cobblestones

If this can be done I will be very happy, you will be happy as I wont be nagging you and we can all enjoy the Christmas/New Year break.

Thanks for you understanding in bring this ongoing issue to an end

Conan
74) Message boards : Number crunching : Missing Credits from Host Total (Message 43577)
Posted 20 Dec 2011 by Profile Conan
Post:
Thanks Les,

It would appear then that they decided to not do anything.
In the larger scheme of things my issue has not been considered important and compared to a database crashing I guess it would appear to be not that big a deal.

However it is an issue for me.

I suppose I just like things to be correct and adding up nicely, annoys me when they don't.

Never mind, I will just have to live with it unfortunately.

Thanks Les for helping as much as you have.

Conan
75) Message boards : Number crunching : More FPU or Integer Power needed? (Message 43553)
Posted 12 Dec 2011 by Profile Conan
Post:
I'm pleased so far with my new i7. I was expecting it to have four cores but was surprised to find eight.

I'd started listening to YouTube recordings of Teresa Berganza while doing various jobs on the computer but it's crunching such a lot and the eight models are using so much of my 20GB monthly bandwidth allowance that I've had to limit my listening.

http://www.youtube.com/watch?v=dWGc9IoxhAw Las tres hojas
http://www.youtube.com/watch?v=ZBEoTZvpHPg Sevillanas

It also has a usable GPU but there's not enough spare bandwidth to let the GPU have more than three or four hours' practice on Einstein each day.



I think you will find that you 4 physical cores (Quad core) but with hyperthreading turned on Windows reports that you have 8 cores, 4 physical and 4 virtual (Intel does not have a Core i7 with 8 physical cores just yet but are working on it).

But as far as you are concerned you can have 8 things running at once just like an 8 core computer.

Enjoy
Conan
76) Message boards : Number crunching : Missing Credits from Host Total (Message 43552)
Posted 12 Dec 2011 by Profile Conan
Post:
I've had a reply, and I've been thinking about it for a couple of days. But I don't understand the figures, and I can't get my credits to add up either.

This is your data:

The archive.stats table says there are 36 entries for a total of 690920.619526058. The entries in the archive for the host you mention are:
    +-----------+------------------+
    |  resultid  |  total_credit  |
    +-----------+------------------+
    |  8400870  | 52254.7200679779 | 
    |  8401264  | 52254.7200679779 | 
    | 10812572 |  2081.77205586433 | 
    +------------+------------------+


The BBCode isn't working. :(
My credits are spread over several defunct parts of cpdn, and I've never kept track of what I've had, so I'm going to have to leave this for someone else to explain.




Thanks for the reply Les,

There should be a lot more results in the archive.

Working backwards with a bit of maths that may help

I have 2 hosts on this project Host 852550 and Host 1124050 .

Host 852550 has a total of 594,397 which appears to be correct.
Total viewable WUs on account total 211,409

Host 1124050 shows total viewable WUs of 184,021.

So Total archived for Host 852550 equals 594,397 - 211,409 = 382,988.

Total amount of archived points = 690,921 So 690,921 - 382,988 = 307,933.

This means that 382,988 archived points belong to Host 852550 and 307,933 archived points belong to Host 1124050.

My Total points 1,086,351 minus Archived points of 690,921 = 395,430 points.

My Total viewable WUs for both hosts is 211,409 (host 852550)+ 184,021 (host 1124050) = 395,430 points

So this bit adds up.


Now if you take my Total of 1,086,351 ans subtract the 594,397 points of Host 852550 then you are left with 491,954 points.

491,954 -
307,933 = (archived points total for Host 1124050)
184,021 (which equals the viewable amount of WUs for Host 1124050).

So I Claim that Host 1124050 should have 491,954 points (not 290,613 as displayed).

Les you show 3 WU that total 106,591.21 points from the archive

I have shown that there is 307,933 archived points for Host 1124050.

Subtract 106,591 from 307,933 you get 201,342 (rounded) which is the original amount that I asked to added back onto Host 1124050.

If you subtract displayed total of 290,613 from Actual Total of 491,954 then again you get 201,341.

So HOST 1124050 should have a TOTAL of 491,954 points.

It is Missing 201,342 (rounded) points from its total.

I don't know why you can't see the rest of the archived work units but I know the maths and I know the amount I had before any results were originally archived.

Thanks for your effort in this Les, it is much appreciated.

Hope the maths makes sense

Conan


G'Day Les,

How are things going now with Climate Prediction?
It has been about 120 days since we last spoke, as I wanted to give the project time to sort out some of it's problems.

Are they sorted out now?
If so has anyone had time to look at my issue?

Thanks
Conan
77) Message boards : Number crunching : Missing Credits from Host Total (Message 42776)
Posted 14 Aug 2011 by Profile Conan
Post:
I've had a reply, and I've been thinking about it for a couple of days. But I don't understand the figures, and I can't get my credits to add up either.

This is your data:

The archive.stats table says there are 36 entries for a total of 690920.619526058. The entries in the archive for the host you mention are:
    +-----------+------------------+
    |  resultid  |  total_credit  |
    +-----------+------------------+
    |  8400870  | 52254.7200679779 | 
    |  8401264  | 52254.7200679779 | 
    | 10812572 |  2081.77205586433 | 
    +------------+------------------+


The BBCode isn't working. :(
My credits are spread over several defunct parts of cpdn, and I've never kept track of what I've had, so I'm going to have to leave this for someone else to explain.




Thanks for the reply Les,

There should be a lot more results in the archive.

Working backwards with a bit of maths that may help

I have 2 hosts on this project Host 852550 and Host 1124050 .

Host 852550 has a total of 594,397 which appears to be correct.
Total viewable WUs on account total 211,409

Host 1124050 shows total viewable WUs of 184,021.

So Total archived for Host 852550 equals 594,397 - 211,409 = 382,988.

Total amount of archived points = 690,921 So 690,921 - 382,988 = 307,933.

This means that 382,988 archived points belong to Host 852550 and 307,933 archived points belong to Host 1124050.

My Total points 1,086,351 minus Archived points of 690,921 = 395,430 points.

My Total viewable WUs for both hosts is 211,409 (host 852550)+ 184,021 (host 1124050) = 395,430 points

So this bit adds up.


Now if you take my Total of 1,086,351 ans subtract the 594,397 points of Host 852550 then you are left with 491,954 points.

491,954 -
307,933 = (archived points total for Host 1124050)
184,021 (which equals the viewable amount of WUs for Host 1124050).

So I Claim that Host 1124050 should have 491,954 points (not 290,613 as displayed).

Les you show 3 WU that total 106,591.21 points from the archive

I have shown that there is 307,933 archived points for Host 1124050.

Subtract 106,591 from 307,933 you get 201,342 (rounded) which is the original amount that I asked to added back onto Host 1124050.

If you subtract displayed total of 290,613 from Actual Total of 491,954 then again you get 201,341.

So HOST 1124050 should have a TOTAL of 491,954 points.

It is Missing 201,342 (rounded) points from its total.

I don't know why you can't see the rest of the archived work units but I know the maths and I know the amount I had before any results were originally archived.

Thanks for your effort in this Les, it is much appreciated.

Hope the maths makes sense

Conan
78) Message boards : Number crunching : Missing Credits from Host Total (Message 42761)
Posted 7 Aug 2011 by Profile Conan
Post:
I did pass on a request to look at your problem, but didn't get a reply.

And with all of the problems for the last month or so, with the project virtually shut down, it's unlikely that any has been done or will be any time soon.



G'Day Les,

I have given it a couple of months so that things can get sorted out (though it seems that is still ongoing).
So how goes the work on Climates computer server issues?

I was wondering if this small issue now has time to be looked into.

I would like my totals sorted so I know what my computers have contributed. I still plan to process some more models in the near future.

Thanks
Conan
79) Message boards : Number crunching : Missing Credits from Host Total (Message 42298)
Posted 31 May 2011 by Profile Conan
Post:
I noticed that about a month ago the Host Total of This Host had dropped by a large amount.

I have waited to see if the totals sort themselves out but alas they have not.

The Total should be about 464,804.47 cobblestones, however it is showing 290,612.84 cobblestones (290,613 after rounding).
Now this lower total happens to be the total of all displayed model results on my account page for this host.
All archived results after being removed from my account page took their credits with them.

My overall total appears OK and my other computer host also appears OK.

If this discrepancy can be corrected It would be much appreciated thanks.

Conan

Happy New Year to the CP team.



Have done a recalculation and of the total of 1,086,351 one host has 594,397 credits and the one with the problem has 290,613 credits.

Added together this is 885,010 credits.

So 594,397 plus 290,613 = 885,010

1,086,351 minus 885,010 = 201,341 credits missing from host total.

290,613 + 201,341 = 491,954 credits This should be my hosts total.

So Host 1 has 594,397 credits
and Host 2 should have 491,954 credits.

I only have the two hosts connected to this project so it is easy to see the missing credits on my hosts totals.

Unsure if others are having a missing credit issue.


With all the work that hired programmers did a few months ago, I was wondering if they have had time to check my problem from the original post in this thread?

I had not heard if all is now working or not and as it has been a couple of months I thought I would see how things were going.

Thanks

Conan
80) Message boards : Number crunching : Missing Credits from Host Total (Message 41511)
Posted 19 Jan 2011 by Profile Conan
Post:
Well, as has been posted elsewhere, there's no programmers here at the moment, and their replacements won't arrive for another few weeks.
And when they do, they have a lot of work to do that will probably be considered far more important then fiddling with credits.
So it could be six months or more before it's fixed.



OK Les, understood that we currently have no programmers (although I had not a post about this and had not associated my missing credits with missing programmers, perhaps they took them?)

Also understood that the new programmers will have a lot of work to do and whilst both you and probably them will consider fiddling with my missing credits as paltry and minor, to me it is a different matter.

I was not insisting that it be done here and now, I was just curious as to where my credit had gone as it coincided with the archiving of old work units.
And as such would like them back.

I can wait, suppose I have too.

Conan
81) Message boards : Number crunching : Missing Credits from Host Total (Message 41485)
Posted 15 Jan 2011 by Profile Conan
Post:
I noticed that about a month ago the Host Total of This Host had dropped by a large amount.

I have waited to see if the totals sort themselves out but alas they have not.

The Total should be about 464,804.47 cobblestones, however it is showing 290,612.84 cobblestones (290,613 after rounding).
Now this lower total happens to be the total of all displayed model results on my account page for this host.
All archived results after being removed from my account page took their credits with them.

My overall total appears OK and my other computer host also appears OK.

If this discrepancy can be corrected It would be much appreciated thanks.

Conan

Happy New Year to the CP team.



Have done a recalculation and of the total of 1,086,351 one host has 594,397 credits and the one with the problem has 290,613 credits.

Added together this is 885,010 credits.

So 594,397 plus 290,613 = 885,010

1,086,351 minus 885,010 = 201,341 credits missing from host total.

290,613 + 201,341 = 491,954 credits This should be my hosts total.

So Host 1 has 594,397 credits
and Host 2 should have 491,954 credits.

I only have the two hosts connected to this project so it is easy to see the missing credits on my hosts totals.

Unsure if others are having a missing credit issue.
82) Message boards : Number crunching : Missing Credits from Host Total (Message 41412)
Posted 1 Jan 2011 by Profile Conan
Post:
I noticed that about a month ago the Host Total of This Host had dropped by a large amount.

I have waited to see if the totals sort themselves out but alas they have not.

The Total should be about 464,804.47 cobblestones, however it is showing 290,612.84 cobblestones (290,613 after rounding).
Now this lower total happens to be the total of all displayed model results on my account page for this host.
All archived results after being removed from my account page took their credits with them.

My overall total appears OK and my other computer host also appears OK.

If this discrepancy can be corrected It would be much appreciated thanks.

Conan

Happy New Year to the CP team.
83) Message boards : Cafe CPDN : Milestones Thread (Message 40372)
Posted 15 Aug 2010 by Profile Conan
Post:
Finally passed 1,000,000 cobblestones.

It has taken awhile (around 4 years I think), but I have gotten there with just 2 computers doing the project mostly in the background.
84) Message boards : Number crunching : Computer wasting multiple models (Message 39766)
Posted 26 May 2010 by Profile Conan
Post:
Another couple

895507 around a hundred models
1076072 All error out, possible Over Clock problem as the Floating Point =3,995.01 million ops/sec and the Integer = 20,514.34 million ops/sec

Thanks
Conan.
85) Message boards : Number crunching : Computer wasting multiple models (Message 39613)
Posted 21 Apr 2010 by Profile Conan
Post:
Only one to report

no successful results, all fail or crash

Thanks
Conan.
86) Message boards : Number crunching : Computer wasting multiple models (Message 39168)
Posted 7 Mar 2010 by Profile Conan
Post:
I have found a swag of hosts to add to the list of model wasters;

1041824 272 models zero points all compute errors

941509 49 models no successes yet

857475 Hundreds of Compute Errors

1031365 Constant Errors since the 27/12/09 over 100

1040276 259 models Zero total all Errors

1052894 142 Models Zero total all Errors

1028006 695 Models Zero Total All Errors

948530 All Models start then fail after one or two trickles, over 100

961803 Constant errors

947688 Over 700 Errors

Six of these hosts are associated with WU
6654409 that is how I noticed them.

Thanks
Conan
87) Questions and Answers : Windows : Transferring or backup CPDN current project? (Message 38935)
Posted 19 Feb 2010 by Profile Conan
Post:
Just remember if it is a more recent version of Boinc that the BOINC folder does not contain any data only the Boinc programme files.
You also need to copy the APPLICATION DATA Folder as well.

This you need to copy the complete BOINC installation of the BOINC folder AND the Application Folder (which contains your project data files).

Older versions did not have this problem as only had the BOINC folder with everything in it.
88) Questions and Answers : Windows : upload of results impossible (Message 38882)
Posted 8 Feb 2010 by Profile Conan
Post:
Thanks guys, that answers my question. I will wait.
89) Message boards : Number crunching : Computer wasting multiple models (Message 38836)
Posted 2 Feb 2010 by Profile Conan
Post:
I know there is a more recent thread about computers that crash too many work units, but I have been unable to locate it quickly as I am going to bed shortly.

I have been paired with the following 4 hosts and they all appear to have problems

940827 has been in constant error mode on all work units since late Dec \'09.
1028360 has been in constant error mode since 1st week of Jan \'10.
980622 constant errors on over 200 work units.
1001262 well over 300 work unit errors with no successful results and zero total.

Is it possible that they can be looked at please so they can be notified that things aren\'t working they way they should.

Thanks
Conan.
90) Message boards : Number crunching : hadcm3istd problem (Message 37430)
Posted 7 Jul 2009 by Profile Conan
Post:
That error isn't all that uncommon, and will result in part of the software marking a model name for download, and a different part NOT sending it. So your records show you as having gotten it, even though it was never sent. Most of us have a few like this.
They don't get deleted from your list.

3-4 months is typical for these models.

These models trickle early in December. (3rd or 4th, I think).
The first one will be about 0.6%, I think.



Thanks Les for the info.
91) Message boards : Number crunching : hadcm3istd problem (Message 37428)
Posted 6 Jul 2009 by Profile Conan
Post:
I re-allowed work fetch and selected the longer work units.
On first contact I got a "HTTP: internal server error".

A bit later reconnected and downloaded a number of "hadcm3trans_6.04_" files which I assume are the application files.

After this it downloaded a "hadcm3istd_csfp_160_06020985_6" series of files for that WU.

This all appeared to be OK until I looked at my account and found that 2 minutes before I downloaded this WU it shows I had downloaded another ?

A "hadcm3istd_csfq_1920_160_06020986_2" WU type which never downloaded to my computer yet appears on my account, I rechecked the messages but cannot find any mention of the WU on my computer.

Where did it go or did it even exist?

If it is lost then can it be deleted from my host as it will stay around till 2012 otherwise.

Also the estimated time for this WU type is working out at 3,289 hours (7.17 hours for 0.218%), is this the runtime for these ones or will it be a bit quicker?
When does the first Trickle happen?

Thanks
Conan.
92) Message boards : Number crunching : Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion (Message 36852)
Posted 4 May 2009 by Profile Conan
Post:
I've had several, and I don't mess around with them. If they go blue, I abort them and get another one.


G'Day Les,

A question if you will,
Why has my Work Unit (WU 8357087) which ran on Linux and completed successfully yet every Windows computer that ran this WU all failed and became Ice Worlds.????
93) Message boards : Number crunching : Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion (Message 36827)
Posted 29 Apr 2009 by Profile Conan
Post:
I see you've now aborted that iceworld, Mican. Its whole workunit #6278040 is rather interesting. A model on a machine with Linux crashed at about the same point. The model Hagar mentioned that seems to have speeded up was on a Mac. It crashed.

Conan has a model from the same workunit running on Linux and Verstapp has one on Intel/Windows, both further behind. Both Conan and Verstapp are contactable. I'll ask them to watch their models to see what happens well into Phase 4. It will be particularly interesting to see whether Conan gets an iceworld on Linux.


G'Day mo.v,
Just an update, I took a quick look at Verstapp's work unit as it has reached phase 4 and I am still at phase 1. The last TS that he uploaded has just taken a large drop from a consistent 1.2 s/TS to 1.33 s/TS so perhaps he has also gone into an Ice world ?


G'Day again mo.v,

Well it took a bit of time but it looks like Linux triumphs this time.
I have finished the model to completion (took 547 hours).
Time steps stayed the same right through and it shows 'Success', I just have to wait for the credits to be granted.

Here is the WU 8357087
if you want to check it out.
I can't read the data very well so really I have no idea if this was getting close to an iceworld or not, I don't have graphics on Linux enabled.
94) Message boards : Number crunching : Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion (Message 36311)
Posted 5 Mar 2009 by Profile Conan
Post:
I see you\'ve now aborted that iceworld, Mican. Its whole workunit #6278040 is rather interesting. A model on a machine with Linux crashed at about the same point. The model Hagar mentioned that seems to have speeded up was on a Mac. It crashed.

Conan has a model from the same workunit running on Linux and Verstapp has one on Intel/Windows, both further behind. Both Conan and Verstapp are contactable. I\'ll ask them to watch their models to see what happens well into Phase 4. It will be particularly interesting to see whether Conan gets an iceworld on Linux.


G\'Day mo.v,
Just an update, I took a quick look at Verstapp\'s work unit as it has reached phase 4 and I am still at phase 1. The last TS that he uploaded has just taken a large drop from a consistent 1.2 s/TS to 1.33 s/TS so perhaps he has also gone into an Ice world ?
95) Message boards : Number crunching : HadCM3 Performance (Message 36300)
Posted 5 Mar 2009 by Profile Conan
Post:
The last HADCM3 model work units I ran on either an Opteron 275 or a Opteron 285 both used to take around 1.93 to 1.97 s/TS.

Thinking of trying another one to see if they have changed in a year.

I had a problem with the displayed times on the work units speeding up about 4 times the actual process times and showing a TS of 0.21 to 0.50 seconds which was not correct, this happened twice on different work units.
So probably time to try again.
96) Message boards : Number crunching : HadSM3MH Performance (Message 36299)
Posted 5 Mar 2009 by Profile Conan
Post:
On an AMD Opteron 275 @2.2GHz with 4 GB RAM on Linux Fedora Core 3
HADSM3f currently getting 1.97 to 1.964 s/TS
HADSM3m currently getting 2.007 to 2.011 s/TS

These two often run at the same time as well as mixing with other projects.
As the SM3f is the more advanced it mostly runs solo so gets the lower times.

On an AMD Opteron 285 @2.6GHz with 2 GB RAM on Linux Fedora Core 3
HADSM3f last time I did one of these on this machine was 25/7/07 it was getting 1.735 to 1.765 s/TS
HADSM3m currently getting 1.947 to 1.983
When a second HADSM3m starts running it\'s speed drops to 2.05 to 2.07 s/TS.

Despite the faster speed of the second machine they run very similar times.
97) Message boards : Number crunching : Trickles and Credits (Message 36081)
Posted 6 Feb 2009 by Profile Conan
Post:
Thanks for the responses, all now appears to be OK.

I made the assumption that the trickles would be awarded almost straight away. But alas I was a bit off.
As I said it has been since April last year since I ran this project so I am a bit rusty in regards to the workings of the project.

Thanks again for the responses and I will be more patient.

Conan.
98) Message boards : Number crunching : Trickles and Credits (Message 36075)
Posted 6 Feb 2009 by Profile Conan
Post:
Has this credit thing been addressed?

I have, after a long break, restarted Climate but I am confused with the \'trickles\' I have returned.

As I understand it each trickle is worth a set amount of points.

If this is the case I am wondering why, after 5 trickles on one computer I have been granted just 94.518 points (one trickle) and on the second computer I have sent in three trickles and have no points credited to me?

It has been two days so I thought something should show by now.

At this stage I am assuming that there are still some problems from late last month, but nothing seems to be stated about it.

Thanks,
Conan.
99) Message boards : Number crunching : Account log in problems (Message 35786)
Posted 1 Jan 2009 by Profile Conan
Post:
It is necessary to use the same email account. Have you changed email addresses?


G\'Day astroWX,

My e-mail address is the same as it has always been.
When I got into the account via the authenticator screen I checked the e-mail address and even changed it to the same one I had, I got the message that the e-mail addresses were the same.

What I did decide to try was to change my password using the same account key that I always have used in the past.

Amazingly it now appears to work,
So it seems that my password or the use of an account key had been changed or corrupted in the password box.

Thanks for replying.
100) Message boards : Number crunching : Account log in problems (Message 35777)
Posted 29 Dec 2008 by Profile Conan
Post:
For a week or two now I have been unable to log into my account via the normal \'e-mail and password\' page.

Keeps saying my password is invalid for my e-mail account.

As I joined a fair while ago I do not have an actual password as this log in system did not exist when I joined. So I use my account key in place of the password and have had no trouble logging in for years.

Now the only way I can log in is via the \'authenticator log in page\' method (on the page where you can also get a new password sent to you).

I have checked my account key against the authenticator and they are the same, so why can I not log in as I use to?
101) Message boards : Number crunching : Work done reverted back to Zero (Message 33238)
Posted 7 Apr 2008 by Profile Conan
Post:
The whole question of BOINC downloads and upgrades for Linux is a mess as far as I can see. Not just the stability of the BOINC version, but the whole question of how to get it going on all the different distros.


It\'s not as bad as it seems but it\'s definitely not as easy as with Windows. If there were people dedicated to the job of building BOINC install/update packages it would be great but the manpower, expertise or dedication seems to not be there.

If you do crunch a climate model again, it might be worthwhile in future holding off upgrading BOINC until you\'ve completed the last model on the machine and have nothing or nearly nothing from other projects.


Personally, I wouldn\'t dream of updating Linux or BOINC while a CPDN model is running. If it\'s absolutely necessary (and having the latest eye candy or strictly for convenience feature doesn\'t = necessary) then I make redundant backups and have a rock solid plan for rolling back the update if it causes problems. Also, I will not run CPDN parallel with other projects. If I start a model then it runs 24/7 to completion, no preempting by other projects because the sooner you get them done the fewer headaches you have. Then I put CPDN on the back burner for a few months and help other projects. Why take unnecessary risks that have zero payoff?



Thanks mo.v and Dagorath,
I will keep this in mind if I decide to come back and have another go.
102) Message boards : Number crunching : Work done reverted back to Zero (Message 33177)
Posted 1 Apr 2008 by Profile Conan
Post:
Well I can forget about a happy ending for this WU.
Due to upgrading Boinc to the latest version for Linux 5.10.45, it trashed all work on all projects and created 3 duplicate computers.

A bit crappy for a latest release that has been tested.

Tried restoring the folder but it still died, with 5.10.45 wiping all data in the Boinc folder no matter what I do. I tried rolling back to an older version of Boinc and then rolling forward again but it still wiped everything.

Climate downloaded another WU but I aborted it as I am now very disheartened with the whole thing.

I only had about 100 hours to go despite Boinc resetting my stats on that WU twice and I was looking forward to it finishing.

I may be back but I don\'t know when, we will have to see.
103) Message boards : Number crunching : hadcm slight deceleration of secs/TS (Message 33036)
Posted 20 Mar 2008 by Profile Conan
Post:
>snip<

The current 5.44 model can be 30% slower than the 5.15 models at the BBC. So, your previous 2 sec/timestep becomes 2.6 s/ts.

So, the numbers are about right - but always slightly worse ...

thanks, interesting point about v.5.44 against the older v.5.15 which I didn\'t know.
the calculations make sense, however we still have the two puzzles of why the models are decelerating fairly systematically and why the second model (160yrs rather than the previous 80yr one) started off at the speed the first one finished at.


One other thing that could also be happening if you are running other projects at the same time is the Boinc Manager/Client fails to release a WU from one project before starting the next.
What I mean here is that for instance you have 4 cores running 4 work units. Time for one WU to cede and give another WU a go, it does not let go but the next WU starts anyway.
I have had this happen on a number of occasions (although still rare), giving me 5 work units running when I have 4 cores.
Three work units run at 100% and the 4th and 5th run at 50% each, cycling back and forth every few seconds to each WU.
Suspending does not stop this and restarting Boinc manager is the only fix I have found whenever I have struck this.

Running at 50% for a few hours or more in a day will affect the overall time average.

Just something else to consider.
104) Message boards : Number crunching : Work done reverted back to Zero (Message 32685)
Posted 21 Feb 2008 by Profile Conan
Post:
You wouldn\'t read about it (well you will after I finish writing this).
It has happened again this time during a WU that the stats had already reset once on, now they have reset again.

The percentage done did not change (around 69%) but hours processed and hours to go changed, hours done reset to zero and hours to go reset to a new value of about 100 hours less than before.

It may have something to do with this message :-
(This all appears to have happened as the WU was getting it\'s information ready to send a trickle up message and I recall this is when this problem happened last time as well)


2008-02-20 01:43:05 [climateprediction.net] Task hadcm3inct_cmf7_1920_160_55869263_1 exited with zero status but no \'finished\' file
2008-02-20 01:43:05 [climateprediction.net] If this happens repeatedly you may need to reset the project.
2008-02-20 01:43:05 [climateprediction.net] Restarting task hadcm3inct_cmf7_1920_160_55869263_1 using hadcm3i version 544
2008-02-20 01:43:08 [climateprediction.net] Sending scheduler request: To send trickle-up message
2008-02-20 01:43:08 [climateprediction.net] (not requesting new work or reporting completed tasks)
2008-02-20 01:43:13 [climateprediction.net] Scheduler RPC succeeded [server version 509]

It appears that the WU started again from last checkpoint and in the process Boinc Manager resets the time counters but the progress stays the same.

I did not notice the last time it did this if the WU \'restarted\' or \'resumed\'. If it restarted then that is why the counters reset. If it resumed then it was a Boinc Manager thing?

The WU is still going and should of trickled again since this hiccup but it remains a mystery.

I am unsure if I have changed the Boinc Client version since the last time this happened.
I still think it is a Boinc thing as no other project has had any trouble.
105) Message boards : Number crunching : Work done reverted back to Zero (Message 31914)
Posted 29 Dec 2007 by Profile Conan
Post:
A further update.
The WU 6602199 has finished successfully.
The time reported for completion is 6,305,222 seconds (+or- a few) short for the actual time taken.
Other than that it all went well and had no other issues.
You can see in the result output where Boinc Manager reset all counters back to zero but the WU kept on going and was granted full credit for the result.

Very strange.
106) Message boards : Number crunching : RAC too low? (Message 31379)
Posted 15 Nov 2007 by Profile Conan
Post:
They are probably not running the same model types.
The short ones give 94.52 cobblestones per trickle (on my 2.6GHz Opteron that happens every 6 hours or so and will complete in less than 4 months).
The non optimised long ones give 259.20 cobblestones per trickle (every 13 hours for me).
The optimised long ones give 310.80 cobblestones per trickle (about 13 or a bit more hores each trickle).

So if one has a long model and the other computer has a short model then overall the one with the longer model will get a higher credit output.

A short model will give 94.52 x 2 = 189.04 cobblestones in 12 hours.
A long model will give either 259.20 or 310.80 cobblestones in 13 hours.

So this could be the reason the RAC has dropped on one computer compared to the other. Plus I noticed that you have had a couple of models crash as well, this will affect RAC.
107) Message boards : Number crunching : Work done reverted back to Zero (Message 31328)
Posted 12 Nov 2007 by Profile Conan
Post:
A bit of an update on this problem I had back in September.
That model is now up to 96.8% and looks like it will complete despite the CPU time information being incorrect. No other problems with this model.

But alas the second model I was running of the same type (hadcm3inct_cmf7_1920_160_55869263) has now suffered from the same malady as the first.
I checked the Boinc progress on my computer and found that the counter had reset back to zero for Time processed, Time to completion and Time left.
After working for a bit longer and the checkpoint picked up again the Percentage done came back to where it was before but the time counters stayed at the reset values, just like the first one did.

I have updated to the 5.10.21 since the first report and all has been running fine till this has happened again.

It is only happening to CPDN, so perhaps on these extra long WU\'s Boinc manager is losing track of things?
This computer runs 10 projects and CPDN is the only one doing crazy things.

You can see in the Slot information I copied below where the model time goes from 1,179 hours down to 0.00 hours.
It seems to happen on switching from one project to another and the Shared Memory has to be released, could this be the problem?

So now I have Two models that have decided to reset their stats for no apparent reason, with the CPDN model chugging on as if nothing happened.

I still suspect a BOINC problem monitoring over extended periods of time, or it a problem with the at least one of the other projects that I run. I recall now that I added a couple back about September.


hadcm3inct_cmf7_1920_160_55869263 - PH 1 TS 1802737 A - 19/06/1990 00:30 - H:M:S=1179:14:14 AVG= 2.35 DLT= 1.00
hadcm3inct_cmf7_1920_160_55869263 - PH 1 TS 1803169 A - 25/06/1990 00:30 - H:M:S=1179:29:51 AVG= 2.35 DLT= 1.00
Suspended CPDN Monitor - Quit request from BOINC...
Cleaning up graphics data...
Detaching shared memory...
shmget: No such file or directory
Beginning work on result hadcm3inct_cmf7_1920_160_55869263_1...
Starting model in /home/ggoninan/BOINC/projects/climateprediction.net...
Created shared memory region key = 173205 of size 655060 bytes (version 602)
Sorry, BOINC could not open shared graphics library!
Starting model ID hadcm3inct_cmf7_1920_160_55869263 Phase 1
Getting pthread attributes - retval=0
Setting pthread size (100663296 bytes) - retval=0
Executing program hadcm3transum_5.44_i686-pc-linux-gnu 173205
Program launched with process id # 13819
Climate model starting - use graphics to monitor progress.
Or visit the website to see the graphs for this run.
hadcm3inct_cmf7_1920_160_55869263 - PH 1 TS 1803169 A - 25/06/1990 00:30 - H:M:S=0000:00:00 AVG= 0.00 DLT= 0.00
scan: cpdnout11.zip
scan: init_data.xml
scan: ozone_hadcm3_1900.gz
scan: DMSallNH3SO21900.gz
scan: cpdnout9.zip
scan: cpdnout13.zip
scan: hdz2hdck_0308_nickfluxcorr.anc.gz
scan: cpdnout15.zip
scan: hadcm3inct_cmf7_1920_160_55869263.zip
scan: volc_v00.gz
scan: cpdnout5.zip
scan: cpdnout16.zip
scan: hadcm3trans_5.41_i686-pc-linux-gnu
scan: hadcm3transse_5.41_i686-pc-linux-gnu.zip
scan: stderr.txt
scan: 1040_flux_corr.anc.gz
scan: ghg_cntrl.gz
scan: SULPC_OXIDANTS_19_A2_1990.mod.gz
scan: hadcm3trans_5.41_i686-pc-linux-gnu.so
scan: 1040_ocean.year.gz
scan: hadcm3transdata_5.41_i686-pc-linux-gnu.zip
scan: cpdnout4.zip
scan: spec3a_sw_3_asol2b_hadcm3.gz
scan: boinc_ufs_cpdnout2.zip
scan: cpdnout7.zip
scan: NAT_VOLC.gz
scan: yafbg.astart.gz
scan: cpdnout3.zip
scan: spec3a_lw_3_asol2c_hadcm3.gz
scan: cpdnout14.zip
scan: boinc_ufs_cpdnout3.zip
scan: boinc_ufs_cpdnout5.zip
scan: boinc_ufs_cpdnout1.zip
scan: SULPC_OXIDANTS_19_A2_1990.gz
scan: cpdnout2.zip
scan: cpdnout6.zip
scan: boinc_ufs_cpdnout6.zip
scan: cpdnout1.zip
scan: cpdnout8.zip
scan: cpdnout10.zip
scan: solar_v00.gz
scan: boinc_lockfile
scan: hadcm3transum_5.41_i686-pc-linux-gnu
scan: cpdnout12.zip
scan: boinc_ufs_cpdnout4.zip
hadcm3inct_cmf7_1920_160_55869263 - PH 1 TS 1803601 A - 01/07/1990 00:30 - H:M:S=0000:17:16 AVG= 0.00 DLT= 0.96
hadcm3inct_cmf7_1920_160_55869263 - PH 1 TS 1804033 A - 07/07/1990 00:30 - H:M:S=0000:34:20 AVG= 0.00 DLT= 0.99
hadcm3inct_cmf7_1920_160_55869263 - PH 1 TS 1804465 A - 13/07/1990 00:30 - H:M:S=0000:51:31 AVG= 0.00 DLT= 0.00
hadcm3inct_cmf7_1920_160_55869263 - PH 1 TS 1804897 A - 19/07/1990 00:30 - H:M:S=0001:08:35 AVG= 0.00 DLT= 1.00
Resuming CPDN!
108) Message boards : Number crunching : 55000 in pending credit...? (Message 31072)
Posted 23 Oct 2007 by Profile Conan
Post:
I have over 126,000 cobblestones pending.

On closer look it is listing all Work Units (9) that are currently classed as active and have not returned a final result.

I have 3 listed that crashed last year and early this year but no longer exist. CPDN thinks I still have them and is showing them as active.
The other 6 are current and active.

None of my completed work units appear in the list.

So nothing really to worry about and astroWX\'s post backs this up.
109) Message boards : Number crunching : Work done reverted back to Zero (Message 30657)
Posted 23 Sep 2007 by Profile Conan
Post:
Simply amazing and strange. My only guess at this point is a flaky BOINC version. I think I tried to update to 5.10.8, 5.10.10, and 5.10.20 all at different points, but the programs had issues running on FC7. The manager and the daemon had communication/launching/terminating problems. You may have discovered a bug.

BTW, are you running the 32-bit or 64-bit version of BOINC 5.10.8?


The 32-bit and FC3.
If it is a bug I don\'t want it.
To get a running cr/h on this project I convert the number of processed seconds into hours and divide that into the amount of credit granted.
I now will have trouble doing that as the seconds counter has restarted from zero, so I will have to estimate the time on the last normal sec/TS and work from that till it finishes. I only have 29% left to go as it has moved onto 71% done now.
110) Message boards : Number crunching : Work done reverted back to Zero (Message 30591)
Posted 21 Sep 2007 by Profile Conan
Post:
This is the section of my Slot folder relating to the problem WU

Resuming CPDN!
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2836081 A - 01/05/2030 00:30 - H:M:S=1757:12:33 AVG= 2.23 DLT= 1.00
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2836513 A - 07/05/2030 00:30 - H:M:S=1757:26:17 AVG= 2.23 DLT= 1.00
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2836945 A - 13/05/2030 00:30 - H:M:S=1757:40:13 AVG= 2.23 DLT= 0.99
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2837377 A - 19/05/2030 00:30 - H:M:S=1757:53:55 AVG= 2.23 DLT= 1.00
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2837809 A - 25/05/2030 00:30 - H:M:S=1758:07:32 AVG= 2.23 DLT= 1.00
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2838241 A - 01/06/2030 00:30 - H:M:S=1758:21:28 AVG= 2.23 DLT= 0.99
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2838673 A - 07/06/2030 00:30 - H:M:S=1758:35:21 AVG= 2.23 DLT= 1.00
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2839105 A - 13/06/2030 00:30 - H:M:S=1758:49:20 AVG= 2.23 DLT= 0.99
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2839537 A - 19/06/2030 00:30 - H:M:S=1759:03:21 AVG= 2.23 DLT= 1.00
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2839969 A - 25/06/2030 00:30 - H:M:S=1759:17:09 AVG= 2.23 DLT= 0.99
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2840401 A - 01/07/2030 00:30 - H:M:S=1759:31:08 AVG= 2.23 DLT= 1.00
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2840833 A - 07/07/2030 00:30 - H:M:S=1759:46:09 AVG= 2.23 DLT= 2.00
Suspended CPDN Monitor - Quit request from BOINC...
Cleaning up graphics data...
Detaching shared memory...
shmget: No such file or directory
Beginning work on result hadcm3inct_cn6q_1920_160_45870254_4...
Starting model in /home/ggoninan/BOINC/projects/climateprediction.net...
Created shared memory region key = 172920 of size 655060 bytes (version 602)
Sorry, BOINC could not open shared graphics library!
Starting model ID hadcm3inct_cn6q_1920_160_45870254 Phase 1
Getting pthread attributes - retval=0
Setting pthread size (100663296 bytes) - retval=0
Executing program hadcm3transum_5.44_i686-pc-linux-gnu 172920
Program launched with process id # 785
Climate model starting - use graphics to monitor progress.
Or visit the website to see the graphs for this run.
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2840833 A - 07/07/2030 00:30 - H:M:S=0000:00:00 AVG= 0.00 DLT= 0.00
scan: cpdnout11.zip
scan: init_data.xml
scan: boinc_ufs_cpdnout9.zip
scan: ozone_hadcm3_1900.gz
scan: DMSallNH3SO21900.gz
scan: cpdnout9.zip
scan: cpdnout13.zip
scan: cpdnout15.zip
scan: boinc_ufs_cpdnout10.zip
scan: volc_v00.gz
scan: cpdnout5.zip
scan: cpdnout16.zip
scan: boinc_ufs_cpdnout8.zip
scan: hadcm3trans_5.41_i686-pc-linux-gnu
scan: hadcm3transse_5.41_i686-pc-linux-gnu.zip
scan: stderr.txt
scan: ghg_cntrl.gz
scan: SULPC_OXIDANTS_19_A2_1990.mod.gz
scan: hadcm3trans_5.41_i686-pc-linux-gnu.so
scan: hadcm3transdata_5.41_i686-pc-linux-gnu.zip
scan: cpdnout4.zip
scan: spec3a_sw_3_asol2b_hadcm3.gz
scan: boinc_ufs_cpdnout2.zip
scan: hadcm3inct_cn6q_1920_160_45870254.zip
scan: cpdnout7.zip
scan: NAT_VOLC.gz
scan: yafbg.astart.gz
scan: cpdnout3.zip
scan: spec3a_lw_3_asol2c_hadcm3.gz
scan: cpdnout14.zip
scan: boinc_ufs_cpdnout3.zip
scan: boinc_ufs_cpdnout5.zip
scan: boinc_ufs_cpdnout1.zip
scan: SULPC_OXIDANTS_19_A2_1990.gz
scan: 1002_flux_corr.anc.gz
scan: cpdnout2.zip
scan: cpdnout6.zip
scan: 1002_ocean.year.gz
scan: boinc_ufs_cpdnout6.zip
scan: cpdnout1.zip
scan: cpdnout8.zip
scan: cpdnout10.zip
scan: solar_v00.gz
scan: boinc_lockfile
scan: boinc_ufs_cpdnout7.zip
scan: hadcm3transum_5.41_i686-pc-linux-gnu
scan: cpdnout12.zip
scan: hfh3hdck_0308_nickfluxcorr.anc.gz
scan: boinc_ufs_cpdnout4.zip
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2841265 A - 13/07/2030 00:30 - H:M:S=0000:18:17 AVG= 0.00 DLT= 1.00
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2841697 A - 19/07/2030 00:30 - H:M:S=0000:36:44 AVG= 0.00 DLT= 0.98
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2842129 A - 25/07/2030 00:30 - H:M:S=0000:55:06 AVG= 0.00 DLT= 0.98
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2842561 A - 01/08/2030 00:30 - H:M:S=0001:13:41 AVG= 0.00 DLT= 1.99
hadcm3inct_cn6q_1920_160_45870254 - PH 1 TS 2842993 A - 07/08/2030 00:30 - H:M:S=0001:29:49 AVG= 0.00 DLT= 1.00

You can see how the Boinc Manager stats all reverted back to Zero, but the WU kept on going as if nothing had happened.

Mo.V, Mark would you know of the reason for this?
The WU still going along ok (I took a screen shot but have no idea how to transfer the image to this forum via Linux, tried a number of programmes but none work. If anyone knows please let me know).
So Boinc Manager is showing
CPU TIME = 42:31:17
PROGRESS = 70.410%
TO COMPLETION = 204:06:11

The CPU time should be showing well over 1,800 hours not 42.
Progress has moved on from the 68% in the first post to over 70% now.






111) Message boards : Number crunching : Work done reverted back to Zero (Message 30565)
Posted 19 Sep 2007 by Profile Conan
Post:

I\'d recommend doing the stability test even if the PC has been stable for a long time. There are several things which can affect your PC, such as dust on fans and heatsinks, fan bearings wearing, power supplies wearing out, and so forth.

What was the status of the 0% job in PS / top? i.e., perhaps D, or S.


Sorry Mike, but didn\'t check it at the time. I just noticed that only 3 jobs were processing not 4 and instead of investigating just restarted the manager.
112) Message boards : Number crunching : Work done reverted back to Zero (Message 30544)
Posted 18 Sep 2007 by Profile Conan
Post:

Do you have \'keep in memory\' set to yes or no? If no, then turning it on may help. Sometimes if a model goes out of memory at the wrong point, it resets it.

Yes I do.

Have you tried running a stability check on the PC for 24 hours or so? (mprime is the linux version of Prime95). Note that you\'ll need to run 4 copies, one for each core.

No I have not, and seeing as the problem has just started within a week of upgrading Boinc I suspect Boinc not my PC. It has been very stable and reliable since I built it a year ago.

When you say \'one of the cores shuts down\' what does this mean? Does the job still appear in PS / Top ? What is it\'s status if it does? What does the boinc manager show for that task?


What I mean is that instead of 4 jobs processing at the same time one stops and only 3 are going.
3 cpus are running at 100% and 1 is at 0%.
Although this has happened twice in the last 2 days, I think this time it may of been related to Rosetta locking up as I set my preferences to run for 21000 seconds (6 hours) but the last job went for over 28000 seconds (nearly 8 hours).
So Boinc 5.10.8 or Rosetta may have problems.
CPDN is still running fine (I have 4 jobs going at once), just one job has it\'s data a bit incorrect.

I will let it run and see what the end result may be, as I have not lost the WU yet and the percent done and credits earned are still intact.

113) Message boards : Number crunching : Work done reverted back to Zero (Message 30541)
Posted 18 Sep 2007 by Profile Conan
Post:
No mo.v I am unable to do that as I now only have the one machine physically near me (my only other active machine is 9 hours away and does not run CPDN).

It seems to be a Boinc Manager thing as I now have noticed that for the second time one of my 4 cores has shut down and only three keep working. No error messages, restarting the manager gets all 4 going again.

I updated to 5.10.8 recently but this may of been a bad move. As it was 3 months old I thought it might be stable but perhaps it is not?

The WU stats seems to show the Boinc Manager figures for CPU time and timestep/second but the CPDN servers are showing the correct number of timesteps.

The Percent done is staying correct (possibly due to the CPDN Servers) but the other figures are screwed up.
114) Message boards : Number crunching : Work done reverted back to Zero (Message 30538)
Posted 17 Sep 2007 by Profile Conan
Post:
I have just had one work unit with 68% completed and over 1700 hours done just go back to 0.095% and Zero hours done (according to the Boinc Manager), I will keep fingers crossed that it sorts itself out as I did not back it up (I have not lost credit on it though).

The work unit is a \'hadcm3inct_cn6q\' type.

I restarted the manager and it started running from the 0.00% mark done, still shows 68.65% completed and now has 217 hours to go.
It was showing over 1700 hours done with 700 odd hours to go.

Maybe I should reboot the machine?

No other work units are affected (I have 3 other CPDN, Cosmology, Rosetta, Einstein and The Lattice Project all running at the moment and none of them have done this).

Boinc Manager is currently showing 4 hours done and 216 hours to go with the 68.65% still there.
115) Message boards : Number crunching : 1st to finish ? (Message 30530)
Posted 17 Sep 2007 by Profile Conan
Post:
Your computers are hidden, so we can not look at the model to see if it did indeed finish.

Oh Poo :O) How about \"Granted credit 39,657.60\"? Is that what is normally granted for a finished model?


For a non-optimised work unit the credit total should be 41,472.00, no matter how long it took you to process it.
For an optimised work unit I am not sure but mine is past this total and still has 13.5% to go.

So if you have 39,657.60 credits then I would say no you tripped just before the line.

I have just had one with 68% completed and over 1700 hours done just go back to 0.095% and Zero hours done (according to the Boinc Manager), I will keep fingers crossed that it sorts itself out as I did not back it up (I have not lost credit on it though).
116) Message boards : Number crunching : Intel P4 1.70GHz 4,576 results ??? (Message 30229)
Posted 28 Aug 2007 by Profile Conan
Post:
Hi Conan

At the end of June while looking at someone else\'s models we came across the first super-maxi-downloader you mention called A**. I sent him two private messages when I found he was registered on the independent forum. They\'re still sitting in my outbox which means he hasn\'t picked them up. We reported him to the programmers who were about to cut off his supply, but then Ash\'s hard drive must have filled up with initial model data (!!!) and the downloads stopped.

The programmers now have a script to automatically cut off the supply to particularly bad offenders. The script also works on members we report to them. We recently reported another megadownloader who did not reply to an email we sent. The script cut off his supply.

Unfortunately the (only) two programmers are working under pressure and don\'t want lots of these reports, only the worst cases.

When I have a moment I\'ll look at your leads to the other megamodellers to see if there\'s some way these people can be contacted. The mods have no access to members\' email addresses, but sometimes there are other legit methods. Realistically, I can only get a message to a person who

* has a website that includes an email address

* or is registered on the independent forum under a name I can recognise

* or has posted on this forum, which enables us to send an email by deleting a post and completing the form


Thanks mo.v, I understand. When trying to help some people they just don\'t want to know.
117) Message boards : Number crunching : Intel P4 1.70GHz 4,576 results ??? (Message 30227)
Posted 28 Aug 2007 by Profile Conan
Post:

This has been going on since at least last year.
Attempts have been made to contact the users, to no avail, so, yes, there is a list to which the computers can be added that prevents them from getting more.
If the person concerned ever becomes aware that their computer has lots of models, fixs the problem, and then wants to continue, they can post and ask for help.
And also explain what has been going on.



Thanks for that Les,
Just wanted to bring it to the projects attention, but as they have it in hand I will let them handle it. Must have old Boinc versions as Boinc is supposed to stop downloading if the computer is over committed, assuming the WU\'s are not just dumped that is.

No problem, have a good day and keep smiling as it makes others wonder what you have been up to.
118) Message boards : Number crunching : Intel P4 1.70GHz 4,576 results ??? (Message 30217)
Posted 27 Aug 2007 by Profile Conan
Post:

You\'ll notice that host 517516 stopped uploading results in mid-July, I gather that was when it was added to a blacklist.


Thanks Mike,
Not sure what you mean by a blacklist. Did CPDN create a database of problem computers and could then block further access?
How would the Project team then know if it was a computer problem, it is now fixed and the user would like to rejoin the project?

Do the other ones I mentioned also fall into this category (except the one with Client Errors, as they surely have a problem)?
119) Message boards : Number crunching : Intel P4 1.70GHz 4,576 results ??? (Message 30210)
Posted 27 Aug 2007 by Profile Conan
Post:
When checking progress of my WU 6615772 on my Host 498972, I came across this :-
Host 517516 which has downloaded WU 6615773 and multiple others per day.
Host registered on 30/12/06 has 4,924.80 credits 0.01 RAC, it is only a P4 1.70 GHz single cpu
With 4,576 results to it\'s name.

Also when checking my WU 6751962 on my host 498972 (I am doing 4 at the same time on this host), I found this :-
Of a minimum quorum of 1, the full replication of 8 have been sent out with 5 Hosts having problems

Host 607819, WU 6751957 has 94.52 credits 0.19 RAC, joined 7/4/07, it is an AMD Turion 64 MK-36,
With 56 results mostly Client Errors.

Host 117744, WU 6751961 has 62,428.81 credits 0.00 RAC,
joined 25/2/05, AMD Athlon MP 2800+ 2 cpus,
Has 1,153 results,
Downloading 2 WU\'s nearly every day.

Host 43150, WU 6751960 has 133,912.40 credits 0.00 RAC, joined 20/9/04, AMD Opteron 240 2 cpus,
Has 927 results,
Downloading 2 WU\'s most days.

Host 83212, WU 6751958 has 102,664.91 credits 0.00 RAC, joined 4/1/05, P4 3.00 GHz 2 cpus,
Has 904 results,
Also downloading 2 WU\'s most days.

Host 587615, WU 6751955 has 42,239.87 credits 0.00 RAC, joined 33/3/07, P4 1.60 GHz 1 cpu,
Has 1,090 results,
Downloading more than 2 a day.

It appears that a lot of extra stress is placed on the projects servers with hosts like this contantly contacting and downloading but not then stopping to process the work downloaded and then contacts the server for more.
It also wastes time when trying to get results as the WU will have to be sent again to hosts that will actually process them rather than just download them.

Perhaps the owners can be contacted to see they are having problems and offer to help them resolve the issues?
120) Message boards : Number crunching : Finished One!!! (Message 30030)
Posted 17 Aug 2007 by Profile Conan
Post:
I am so happy - after 4300 hrs I have just completed a model (believe it was an ocean).

8/14/2007 5:27:14 PM|climateprediction.net|Computation for task hadcm3ohc_182r_05605482_1 finished

It\'s been a while since I\'ve had one run the whole way through.

Wendy


Congradulations \'web03\', it is a great feeling when one completes.
It is also a great frustration when they don\'t.
I have one computer that has 10 Work units credited to it but it has only completed one 160 year model and two 45 year models, all the rest died (two of them show as still running but I lost them in Feburary).
121) Message boards : Number crunching : Claimed versus Granted Cobblestones (Message 29272)
Posted 17 Jun 2007 by Profile Conan
Post:

Credit is a fickle thing. It can sometimes take a while if the program stops for some reason.
It\'s more like \"climate\" than \"weather\". But it\'ll show up sooner or later.
(A bit like death and taxs.)



Thanks Les but it is good news. The last two trickles that I had a guess were missing have now been added and credited.
I now have my 2nd completed CPDN WU and a hapy guy am I. TWo more to go.
122) Message boards : Number crunching : Claimed versus Granted Cobblestones (Message 29269)
Posted 17 Jun 2007 by Profile Conan
Post:
Whilst the graph is showing it finished in 2079 it actually went to completion in 1/12/2080


hadcm3ohc_1na3_05625186 - PH 1 TS 4146769 A - 25/11/2080 00:30 - H:M:S=2193:49:20 AVG= 1.90 DLT= 1.00

hadcm3ohc_1na3_05625186 - PH 1 TS 4147201 A - 01/12/2080 00:30 - H:M:S=2194:02:40 AVG= 1.90 DLT= 0.78
file dataout/1na3fo.pjs0c10 is a 32 bit ieee um file
file dataout/1na3fo.pis0c10 is a 32 bit ieee um file
file dataout/1na3fo.pfs0c10 is a 32 bit ieee um file
file dataout/1na3fo.pcs0c10 is a 32 bit ieee um file
stash/field code 30320 is unknown to LATS lookup table and is not written to output netcdf file. You need to define the field via an external LATS parameter file and PP codes -> LATS conversion table using the -l and -p options.
file dataout/1na3fo.pbs0c10 is a 32 bit ieee um file
file dataout/1na3fa.phs0c10 is a 32 bit ieee um file
file dataout/1na3fa.pgs0c10 is a 32 bit ieee um file
file dataout/1na3fa.pes0c10 is a 32 bit ieee um file
file dataout/1na3fa.pds0c10 is a 32 bit ieee um file
Trickling yearly means for 2080
Building file list for 2080 in dataout -- 9 entries
adding: hadcm3ohc_1na3_05625186_0_meana_2080_pd.nc (deflated 83%)
adding: hadcm3ohc_1na3_05625186_0_meana_2080_pg.nc (deflated 83%)
adding: hadcm3ohc_1na3_05625186_0_meano_2080_opt.nc (deflated 80%)
adding: hadcm3ohc_1na3_05625186_0_meano_2080_pi.nc (deflated 86%)
adding: hadcm3ohc_1na3_05625186_0_meano_2080_pj.nc (deflated 81%)
Creating trickle file trickle_hadcm3ohc_1na3_05625186_0_2080.zip for upload...
Processing decadal means for 2071 through 2080
adding: hadcm3ohc_1na3_05625186_0_10ya_2070s_ph.nc (deflated 11%)
adding: hadcm3ohc_1na3_05625186_0_10ya_2070s_pf.nc (deflated 57%)
adding: hadcm3ohc_1na3_05625186_0_10ya_2070s_pe.nc2007-06-16 18:48:47 [climateprediction.net] Sending scheduler request: To send trickle-up message
2007-06-16 18:48:47 [climateprediction.net] (not requesting new work or reporting completed tasks)
(deflated 29%)
adding: 1na3fo.pcs0c10.nc (deflated 52%)
adding: 1na3fo.pbs0c10.nc (deflated 60%)
adding: ocean_restart.day (deflated 62%)
adding: shmem_restart.day (deflated 0%)
adding: climate.cpdc (deflated 75%)
adding: yafbg.stashc (deflated 90%)
2007-06-16 18:48:52 [climateprediction.net] [file_xfer] Started upload of file hadcm3ohc_1na3_05625186_0_16.zip
2007-06-16 18:48:57 [climateprediction.net] Scheduler RPC succeeded [server version 509]

2007-06-16 18:54:11 [climateprediction.net] [file_xfer] Finished upload of file hadcm3ohc_1na3_05625186_0_16.zip
2007-06-16 18:54:11 [climateprediction.net] [file_xfer] Throughput 27246 bytes/sec
Queuing intermediate upload for CPDN/BOINC: cpdnout16.zip
Phase over, going into post_processing()
Post-processing successful!
Finished a complete run, now you can upload!
Cleaning up graphics data...
Detaching shared memory...
2007-06-16 18:55:10 [climateprediction.net] Computation for task hadcm3ohc_1na3_05625186_0 finished


2007-06-16 21:06:03 [climateprediction.net] Sending scheduler request: Requested by user
2007-06-16 21:06:03 [climateprediction.net] Reporting 1 tasks
2007-06-16 21:06:13 [climateprediction.net] Scheduler RPC succeeded [server version 509]

As you can see it says I completed a complete run right till 1/12/2080 with no errors.

So why have I lost credit for this WU?
It seems from the readout that the last trickles have not been credited perhaps?
123) Message boards : Number crunching : Claimed versus Granted Cobblestones (Message 29266)
Posted 16 Jun 2007 by Profile Conan
Post:
Have just finished my 2nd ever complete Work Unit but there seems to be a difference to the way credit claims have been done.
When my first 5.15 WU completed in April the claimed and granted cobblestones were the same.
Now I have completed my 2nd 5.15 WU the claimed and granted are totally different.
Also on my first completed WU the time difference from the last trickle to the final time total was very small (about a thousand seconds). On my 2nd completed WU the time difference from the last trickle to the final time total was close to another trickle in time (20,000 odd seconds), is this significant?
Seeing that the second WU took longer than the first I would of thought I would of got more credit for it but instead I got less.
Can anyone tell me why my claimed has been so low?

1st completed WU

2nd completed WU

Also even though both WU\'s completed ok they both have the same error message on the result

(null): cannot open input file dataout/atmos_restart.day
(null): cannot open input file dataout/ocean_restart.day

124) Message boards : Cafe CPDN : Milestones Thread (Message 29265)
Posted 16 Jun 2007 by Profile Conan
Post:
> Have passed 200,000 cobblestones in CPDN.
Also passed 200,000 recently in Docking and Ralph.
Bit of a triple milestone.

Trying to get to 500,000 in QMC and 400,000 in Einstein.

Was hoping to finish my two current CPDN models and have a rest as that would make 3 complete models, but accidently downloaded another two models. So I will have to keep going for a bit (lots of bits) longer.


Have now finished my 2nd ever complete Work Unit. A great feeling.
125) Message boards : Cafe CPDN : Milestones Thread (Message 29173)
Posted 6 Jun 2007 by Profile Conan
Post:
> Have passed 200,000 cobblestones in CPDN.
Also passed 200,000 recently in Docking and Ralph.
Bit of a triple milestone.

Trying to get to 500,000 in QMC and 400,000 in Einstein.

Was hoping to finish my two current CPDN models and have a rest as that would make 3 complete models, but accidently downloaded another two models. So I will have to keep going for a bit (lots of bits) longer.
126) Message boards : Number crunching : Error on File Upload (Message 28224)
Posted 28 Apr 2007 by Profile Conan
Post:
I run a numer of projects at the same time (about 6), so stopping network activity is not really an option.
I have suspended the CPDN task AND the CPDN project and this seems to have stopped my computer trying to upload as you get a \'communication failed\' error. It still keeps counting down in the transfer box but can\'t send.

This will have to do for now.

EDIT: Although it stopped the first attempt it did not stop the second and the file has tried to upload again, despite being Suspended.

Some of my projects have short dead lines of a week so stopping network activity I am not too keen about. I also don\'t sit on the computers all the time.
127) Message boards : Number crunching : Error on File Upload (Message 28132)
Posted 26 Apr 2007 by Profile Conan
Post:
At least I now know why I can\'t upload, but why am I classed as spam?

Hold on, the rating code is something different from the anti-spam code. They don\'t work together.

The anti-spam code is checked through an external server. Your whole post is checked on words it deems spam.

The rating is something you set yourself. Check your forum preferences and see what the settings are for Filtering. Default is -25 and 5. Perhaps that you put them to -1 and 1...

Save changes to the site and you\'ll see all posts with a -1 or lower rating again. Let\'s then hope that the person or persons doing the rating get a life. ;-)

(P.S: I have mine set to -1001 and 1000... that way I see all posts in a normal way.)


Thanks Jord, I didn\'t realise that, have changed from 0 to -25 and 0 to 10.
128) Message boards : Number crunching : Error on File Upload (Message 28129)
Posted 26 Apr 2007 by Profile Conan
Post:
I tried to post here with my problem but was classed as spam and deleted.
I then had to give a \'+\' to the post prior to this one as it was filtered with a \'-1\' so I could read it.
Where this post referred me to was also filtered (odd seeing it is by a moderator) and I had to give a \'+\' on that one as well.

At least I now know why I can\'t upload, but why am I classed as spam?
129) Message boards : Cafe CPDN : Milestones Thread (Message 27882)
Posted 13 Apr 2007 by Profile Conan
Post:
>> You Beauty, I have finally finished a complete Work Unit (that is one that has not crashed, errored out or a hissy fit in Boinc killed).
Took from 7/11/06 till 11/4/07 (12/4/07 in Australia) taking 2,188 hours 57 minutes and 28 seconds.
They may be long but their is great satisfaction when you complete one.
Now only 2 more to go.

Also passed 600,000 Cobblestones in Rosetta.
1.8 Million Cobblestones total and Team Cobar Spiders passed 2 Million Cobblestones as well.
130) Message boards : Cafe CPDN : Milestones Thread (Message 25905)
Posted 9 Jan 2007 by Profile Conan
Post:
> Have passed 100,000 cobblestones in CPDN.
Also passing 40,000 cobblestones in Docking.
131) Message boards : Number crunching : Fastest Completion Times for WU (Message 25330)
Posted 28 Nov 2006 by Profile Conan
Post:
> The ones I had on my AMD Opteron 275 (2.2 GHz) said 1170 hours.
After 1700 hours with 400 to go, one WU crashed and in trying to get it to restart again I lost Boinc and the other WU at 1100 hours with about 800 to go.
Both the new WU\'s say 1170 hours.

On my AMD Opteron 285 (2.6 GHz) completion time is 990 hours, after 340 hours one WU is 16% done (340 is not 16% of 990, so the WU will go much longer).

Hopefully I will be able to see these ones through to completion.
132) Message boards : Number crunching : Computational Error (Message 25184)
Posted 19 Nov 2006 by Profile Conan
Post:
Not the old FILE.
The entire BOINC FOLDER, along with all of the sub-folders.

The data needed to restart is scattered over several folders, starting in the main BOINC folder, and extending down to a sub-folder of the models folder.

The error code 1 may well be because of shutting down your computer without first exiting from BOINC, whatever operating system you use.
Or because of an older version of the graphics software.



Thanks Les, it looks like I have lost that WU then, as I have only been backing up the climateprediction.net subfolder under the project subfolder in Boinc folder.
This would explain why removing the climate subfolder and then replacing with my backup did not change anything and the client kept on doing what it was doing before. I have not been backing up the whole Boinc folder, and as I run 6 other projects on the same computer I believe any restart from a backed up folder will create errors and problems with the other projects, so I will just forget about that WU.
133) Message boards : Number crunching : Computational Error (Message 25175)
Posted 19 Nov 2006 by Profile Conan
Post:
Thanks astroWX,
I might try from a backup, which is 1 or 2 weeks old, so have not lost much.
The \'exit code 1\' that I am getting must be from a different souce as I am not running Windows but Linux and that thread from Les is about Windows machines.

Do I just locate the old file in the backup project folder and copy that back into the current working project folder? Will Boinc detect this?
134) Message boards : Number crunching : Computational Error (Message 25150)
Posted 17 Nov 2006 by Profile Conan
Post:
> Hello to the Climate team.
I have resurected this thread as it has the correct title.
I noticed today that one of my CP WU\'s had disappeared from my computer.
Being a bit naive I thought \"you beauty, I have completed my first WU\".
On checking the result I was informed that the WU had failed with a \"computational error\".
My next words uttered was \'bullshit, I don\'t believe it, what happened there?\'.
It would appear that after 6,434,209.302419 seconds (1,787.28 Hours) that the workunit decided to die on it\'s sword for no real reason that I could see.


http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=5217487


got the error \"exit code 1\" and the following

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/2jkgfo.pjo2c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/2jkgfo.pio2c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/2jkgfo.pfo2c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/2jkgfa.pho2c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/2jkgfa.pgo2c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/2jkgfa.peo2c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/2jkgfa.pdo2c10 to netcdf format.

</stderr_txt>

Validate state OK
Claimed credit 32,549.14
Granted credit 31,622.40
application version 5.08

Has all my time been worth it? Or wasted? Is the WU now of any use or has all the trickles I sent in given the data that the scientists needed?
135) Message boards : Cafe CPDN : Milestones Thread (Message 25084)
Posted 11 Nov 2006 by Profile Conan
Post:
> Bit of a triple milestone,
50,000 Cobblestones for ClimatePrediction
200,000 for Einstein
1,000,000 Cobblestones for all Boinc projects.
Will get 400,000 Cobblestones in Rosetta in a week or so.
136) Message boards : Cafe CPDN : Milestones Thread (Message 24130)
Posted 29 Aug 2006 by Profile Conan
Post:
Have made 20,000 cobblestones. Still only a third through my two Wu\'s after 665 to 717 hours of crunching with between 1078 to 1054 hours to go.
137) Questions and Answers : Getting started : Please make me the Founder (Message 23773)
Posted 26 Jul 2006 by Profile Conan
Post:
this should be updated (i.e. Conan is founder of \"TeamAUS\"). Unfortunately there is no \"official\" BOINC way for this, so I just did it manually.


To Carl,
I would like to thank you for doing this for me. If my post sounded loke a demand then I am sorry, as it was not meant to be.
The work that you and your team does would keep you very busy, so thanks again for what you have done for me and lets find some solutions to todays Climate problems. Lets hope that the work of the ClimatePrediction.net team and all the volunteer crunchers is not wasted by silly Government decisions with only short term goals, instead of a longer term view.
138) Questions and Answers : Getting started : Please make me the Founder (Message 23731)
Posted 24 Jul 2006 by Profile Conan
Post:
Mods can\'t help you. No tools. Don\'t know what the Board allows the Oxford staff to do -- but they are two and are buried in high-priority development/maintenance/database reorg/et al. Sorry.


Thanks for the reply astroWX. I believe the project people can help me, I will just have to wait I suppose.
139) Questions and Answers : Getting started : Please make me the Founder (Message 23714)
Posted 23 Jul 2006 by Profile Conan
Post:
To the project people,
I am submitting this post in the hope that you can help to make me the founder of the team I am in.
I have joined the team TeamAUS, I found out
the Founder of the team (Craig Zuvich), passed away suddenly on the 16/8/05 with his wife letting his computers run till the end of the year before turning them off.
TeamAUS participates in ClimatePrediction.net, Einstein@home, LHC@home and Predictor@home. I added Rosetta@home when I first joined.
I have been in and out of 3 projects in this team since October 2005 including SETI of which David at Seti, made me the founder late last year (recently I have moved this team). I am now adding LHC and ClimatePrediction to the projects that I do (I did not know the units were so big and long in CP, but I will get through them).
I would like to be made the Founder as I am the only person now in the team.
Team name is TeamAUS
Original founder is Craig Zuvich
My name is Conan
Country is Australia.
Thank you for your time.

To the Project People / Moderators
It has now been 15 days since posting, can this request be seen to please, I can\'t do anything with this team until made the Founder.
Thank you, and awaiting your reply.

140) Questions and Answers : Getting started : Please make me the Founder (Message 23543)
Posted 8 Jul 2006 by Profile Conan
Post:
To the project people,
I am submitting this post in the hope that you can help to make me the founder of the team I am in.
I have joined the team TeamAUS, I found out
the Founder of the team (Craig Zuvich), passed away suddenly on the 16/8/05 with his wife letting his computers run till the end of the year before turning them off.
TeamAUS participates in ClimatePrediction.net, Einstein@home, LHC@home, Predictor@home and Seti@home. I added Rosetta@home when I first joined.
I have been in and out of 3 projects in this team since October 2005 including SETI of which David at Seti, made me the founder late last year (I have since moved this team). I am now adding LHC and ClimatePrediction to the projects that I do (I did not know the unit were so big and lond in CP, but I will get through them).
I would like to be made the Founder as I am the only person now in the team.
Team name is TeamAUS
Original founder is Craig Zuvich
My name is Conan
Country is Australia.
Thank you for your time.




Previous 20

©2024 climateprediction.net