climateprediction.net home page
Posts by Ingleside

Posts by Ingleside

61) Message boards : Number crunching : Upload problems (Message 41965)
Posted 10 Apr 2011 by Ingleside
Post:
The files are uploading now. All that was needed was to restart the daemon:
sudo /etc/init.d/boinc-client restart

No idea why it was necessary, but happy that it worked.
Thanks for the help

You seem to be running v6.10.17. There was a bug fixed around v6.10.3x that had to do with DNS-lookup, there client would always use the same, possibly bad, ip-address. A re-start of client was the only way to fix this problem.

62) Message boards : Number crunching : Am I Misreading This? (Message 41918)
Posted 5 Apr 2011 by Ingleside
Post:
Just adding more info now have checked the defaults.

The BOINC client-defaults are:
10 GB total.
0.1 GB free.
50% total.

The CPDN default preferences are:
100 GB total.
0.001 GB free.
50% total.

So, neither of these should give any large problems...

But, for some strange reason, the CPDN-preferences aren't downloaded before going to preference-page on the web-page and hitting "Update preferences". Still, the BOINC-default at 10 GB should in most instances be enough, even for CPDN.





63) Message boards : Number crunching : Am I Misreading This? (Message 41916)
Posted 5 Apr 2011 by Ingleside
Post:
Two solutions come to mind.

1. Inform the BOINC team that their defaults are too low and hope that the next time they update the package they use bigger disc buffer allocation sizes.

The preferences being used depends on the scenario you're at:
1: On a completely new install, before you joins any projects, the client will use the default client-preferences, and it's maybe possible these is only 4 GB disk-usage (can't check at this point).
2: Then this client joins a project, on the 1st. connection to the project, the client will get preferences from the project.
3: The preferences from the project is either:
a: The project-defaults, if you've joining a project you haven't run before.
b: or, if re-joins a project you've ran before, you'll get whatever preferences you've had before.

So, the BOINC-client-defaults is only a problem on the 1st. connection to a project, not on any later connections.

As for the default preferences set by a project, since majority of users runs only a single project, the most relevant is CPDN's default preferences. Maybe these is too low, if it's only 4 GB it's definitely too low...


Note, the behaviour doesn't apply if you've set local preferences.

Oh, and if multi-project and you joins a completely new project (for you), there's a chance you'll get this projects defaults...
64) Message boards : Number crunching : hadcm3n Shorter deadline? (Message 41915)
Posted 5 Apr 2011 by Ingleside
Post:
The accuracy of the to-completion estimate should improve and the high priority will then be relaxed. (Does that happen as the model goes along or at the end? As a single-project cruncher who uses machines that are on 24/7, I've never attempted to understand the combinations of FLOP estimates, machines, DCFs, model mixes etc.)

The "to completion"-estimate is at start fully determined by the flops-estimates and machine-parameters including DCF. During a run the estimate should put more and more weight on actual done for this run, and less on the initial estimates.

DCF is only updated at the end, with a successful finish. Also, it will increase fast if neccessary, but take slower to decrease. With a mix of models to run, the DCF can constantly fluctuate so the estimates will never be very accurate, especially if the flops-estimate is too low so takes longer to run than expected.
65) Message boards : Number crunching : Missing Work Units (Message 41608)
Posted 8 Feb 2011 by Ingleside
Post:
In the meantime, is there a way to synchronize what is on my computer with what the project thinks I have?

If the computer has returned all it's CPDN-work, a detach and re-attach to CPDN should normally mark any missing models as "client detached".

66) Message boards : Number crunching : Some PCs Trickling, Some Not (Message 41414)
Posted 1 Jan 2011 by Ingleside
Post:
This, I didn't know and is very useful to know. In all these years, I hadn't had this type of problem before, at least not to the point that it became really evident. Certainly I was aware of the homepage being down, just not the 10 consecutive failed download request limit. Perhaps it's been explained in these forums awhile ago, but I failed to comprehend it because I hadn't been affected by it to any noticeable extent.

It's possible this has been an issue before, but I don't remember any such outages but granted it can take months between each time look on the "main" CPDN-webpage. The "main" CPDN-webpage being on a separate server is an advantage, since even with the frequent outages on the BOINC-side of things, the "main" webpage is normally still up and there isn't any problems accessing the "master"-url.

67) Message boards : Number crunching : Some PCs Trickling, Some Not (Message 41404)
Posted 31 Dec 2010 by Ingleside
Post:
I suspended BOINC, exited, then deleted sched_request_climateprediction.net.xml, sched_reply_climateprediction.net.xml, stdoutdae.txt, master_climateprediction.net.xml and job_log_climateprediction.net.xml (I know, more files than I needed to), then restarted BOINC and Resumed.

sched_request* and sched_reply* is generated new for each scheduler-request, so manually deleting them should only have any effect if they somehow had been write-protected in such a way that BOINC couldn't delete them...

master* is the home-page, and is also re-generated each time the master-url is being tried, so again deleting this has no effect.

stdoutdae.txt is the log that contains all the various info, like communication-errors, and is automatically re-cycled as needed. So, deleting this makes it impossible to look-up any errors that can be relevant to track-down the problem.



As to why many has problems trickling... something that's easy to overlook, if you're going directly to these forums is, THE HOMEPAGE IS DOWN. If BOINC-client has had 10 failed scheduler-request in a row, the homepage (the master-url) is tried re-downloaded, and if this fails, you'll immediately get a 24-hour deferral. So, until the homepage is back up and running again, you can't do any scheduler-requests, this includes uploading trickles.

For anyone that haven't had so many scheduling-errors in a row that they needs to re-check the homepage, they've not aware of any problems with trickles. ;)


Edit - it seems Milo has fixed the problem, so the home-page is finally up and running again. So, if you either manually does a scheduler-request, or just let the upto 24-hour deferral count-down, so everyone should upload their waiting trickles within 24 hours.
68) Message boards : Number crunching : Uploads Fail (Message 41381)
Posted 29 Dec 2010 by Ingleside
Post:
Is there a time-out on the zips waiting to upload? I currently have 20 waiting in the transfers tab since the 25th.

With v6.10.0 and later BOINC-clients, if a transfer still fails 90 days after the 1st. transfer-attempt, the BOINC-client aborts the transfer. For v6.6.xx and earlier clients, the timeout is only 14 days.

69) Message boards : Number crunching : Can't upload for 12 days (Message 41361)
Posted 26 Dec 2010 by Ingleside
Post:
Very good tip, but kraken does not respond , because (IMO) he has crashed again.

when I suspend FAMOUS WU's, Boincmgr continous to try to upload. Is the uploadtry time infinite or will this WU's crashing by to many upload errors?

The number of failed attempts isn't important, but any file-transfer will error-out if 90 days after it's 1st. attempt it still fails. Please note, with v6.6.xx and earlier clients the limit was only 14 days.

70) Message boards : Number crunching : Sorting for platform (Message 41246)
Posted 8 Dec 2010 by Ingleside
Post:
There are no actual "success" tasks, right? Only completed ones. By that measure, a daily quotum can never recover. Does that mean a 'failure' on CPDN will not actually decrease the quotum for that host? Or the original quotum is restored.
Hence maybe the problem with minussing hosts that do not stay minussed?

I didn't mean "success"-task, but "reported as success", since atleast with the "old-style" quota-code, every time a "success" was reported by client the quota was doubled (if wasn't already at max).

So, "success"-report means "client reports the task has finished without any errors".

If this later changes to invalid or something due to validator is another matter...

BTW, apparently the web-pages has been changed, so task-status doesn't call it "success" any longer, but rather "Completed, waiting for validation" or another variant of "Completed...".


As for how the "new-style" per-application quota-system works, haven't looked-up the new code, but atleast going by another project, the "pending"-tasks haven't changed the quota, only the Validated tasks has...

But this obviously can't be the case for the server-code CPDN is using, since if it was, most CPDN-computers by now would sit with a quota of 1 per application.
71) Message boards : Number crunching : Sorting for platform (Message 41232)
Posted 5 Dec 2010 by Ingleside
Post:
Yes, the FAMOUS error rate probably makes the detection of reliable computers impossible. And any computer that's run a lot of slabs will also very probably have had iceworlds.

Would failed downloads make a computer unreliable?

Well, without a validator no computer will become reliable... :)

But as far as download-errors is concerned, this will decrease the daily quota, and any computer with decreased daily quota is not "Reliable". Since the quota increases again on "success"-reports, if there's no other reasons for being unreliable, a computer can very quickly be back to "Reliable" again.


The problem with FAMOUS is that if example 1st. copy is sent to an "intel + windows" and this gives an error, there's a fairly good chance a "Reliable" computer that gets the re-issue will also be an "intel + windows", and in most instances this means the exact same error. So being "Reliable" doesn't really mean much for FAMOUS, since it's the wu's themselves that is unstable, and not majority of computers.
72) Message boards : Number crunching : Sorting for platform (Message 41227)
Posted 5 Dec 2010 by Ingleside
Post:
For what it's worth, WCG provides capability for Projects to use the "trusted computer" technique; see Single Validation – Type 1:
http://www.worldcommunitygrid.org/help/viewTopic.do?shortName=points#174

No indication that boinc is involved; my guess is that it's IBM/WCG server code.

While both "Adaptive replication" and "need_reliable" was developed by WCG, it's been part of standard BOINC-code for a long time.

While "Adaptive replication" is great for min_quorum = 2 projects there it can reduce the average task/wu from 2.xx to around 1.05 - 1.10 task/wu, CPDN is using min_quorum = 1 so "Adaptive replication" can't reduce this any further.

Then it comes to "need_reliable", this could be an advantage, since you're guaranteed any re-issue is only sent to "Reliable" computers that has fast turnaround-times, so chances are the re-issue will be returned fairly fast. It's also possible to set wu-priority so high initially on wu-generating that they "need_reliable". But, the big problem with Famous is that appart for some computers that routinely error-out all wu's, most Famous-errors is wu-specific, so a "Reliable" computer will give the same error...

Also worth remembering is, for a computer to become "Reliable", it must have enough Validated results, but CPDN has never used a validator...
73) Message boards : Number crunching : RESOLVED - Too Many Total Results / Too Many Errors (May Have a Bug) (Message 41121)
Posted 21 Nov 2010 by Ingleside
Post:
"Completed", a final 1366KB zip file uploaded and taking the 6748 total credit granted, 68 trickles were uploaded and the task marked completed. The FAQs (outdated?) say there should be 72 trickles, which equates exactly to 7145 credit. If 4 trickles went missing it means that in the 4 years, to the day exactly that I've tried to complete even a single model, none (17) ended up correctly. Oh well.

--//--

Guess your result is 10248883. If so, it's now showing-up with all 72 trickles.

It's normal with CPDN that there's a delay between uploading a trickle before it shows-up on the web-pages, and you get credited for the trickle.

The trickle-info is updated in roughly the same way like WCG updates the stats and badges, neither happens instantaneously, but instead only happens every N hours. While WCG updates every 12 hours, I'm not sure if CPDN is currently only updating every 24 hours or something.

74) Message boards : Number crunching : Task but no workunit (Message 40844)
Posted 12 Oct 2010 by Ingleside
Post:
I wonder how the Einstein resend feature works in preventing phantoms. Does this mean that if a task doesn't download successfully the Einstein server starts again?

The re-issue functionality is just a configuration-switch projects can enable, and the only disadvantage is added server-load. Many projects like Einstein, SIMAP, WCG and so on uses this functionality, and SETI@home atleast ocassionally uses it...


For each Scheduler-request, v4.45 and later clients includes a list of tasks currently assigned to client in this project.

Server-side, if project has enabled re-issue, every time user asks for more work, the Scheduling-server compares the client-list against the tasks the host has been assigned, and any tasks client should have but doesn't will be re-issued to client by server (or not re-assigned if too little time to deadline or wu errored-out or something).

Any tasks on client, or any reported as download-errors and so on, won't be re-issued.

So, the re-issue functionality only ** works against "ghost"-wu, in most instances this is clients that asks for work, server assigns some work but client doesn't get the reply, so client asks for more work again 1 minute later or something. So, instead of server assigning yet another task, instead the tasks client didn't get 1 minute before will be sent again.

There's limits of how much work can be issued in a single reply, so even if CPDN does enable this functionality, no-one will get 1000 re-issues to a single comuter, even they somehow does have so many current ghosts...



** If user does a reset, he'll not have any work on client, and if re-issue is enabled this will mean all his work will be re-issued. A detach/re-attach on the other hand will not re-issue any work, instead all work will be marked "detafched".
75) Message boards : Number crunching : Upload problem (Message 40831)
Posted 10 Oct 2010 by Ingleside
Post:
The timeout on file uploads was lengthened to 90 days so there's no rush from that point of view but I think each file is only allowed 100 upload attempts. Don't keep repeating manual retries (the Retry Now button) until someone can suggest more ideas.

I'm not aware of any limits on #retries, and a quick test reveals that manually increasing to 11000 connection-attempts had no effect, the upload just kept retrying as before.

Worth remembering, since many CPDN-users still seems to use old BOINC-clients, is that the increase to 90-day is only for the v6.10.xx and later clients.


As for checking for connection-problems, the 1st. is always to re-boot any modems, routers and so on, and to re-boot the affected computer.

If this doesn't work, try creating/edit a cc_config.xml (placed in BOINC data-directory) containing minimum the following lines:
<cc_config>
<log_flags>
<file_xfer_debug>1</file_xfer_debug>
<http_xfer_debug>1</http_xfer_debug>
</log_flags>
</cc_config>

And just select to "Read config file" in BOINC Manager.

Keeping <file_xfer_debug> always enabled is an advantage, since you'll always get info about which upload-server is tried connected, making it easy to check with the server status-page if this server is down, and you don't need to manually search-through client_state.xml to get this info. The transfer-speed will also be logged if the transfer was successful.

The 2nd. option on the other hand will create much extra info, so disabling it again after fixing the problem is recommended. To disable, just change the 1 to a zero, and re-read config-file.

A couple other <log_flags> that possibly also can be useful is:
<http_debug>1</http_debug>
<proxy_debug>1</proxy_debug>


edit - I see Gundolf Jahn also did mention some of the log-flags earlier in the thread.
76) Message boards : Number crunching : 159,333 FAMOUS models cant download any ! (Message 40254)
Posted 28 Jul 2010 by Ingleside
Post:
It's now something like: 2 processors * 4 WU per day * number of applications

This depends on if CPDN has upgraded their scheduling-server again or not, since the code at start of June didn't scale by #cpu's...

Hmm, doing a quick test, I quickly hit:

28.07.2010 15:19:59 | climateprediction.net | Scheduler request completed: got 0 new tasks
28.07.2010 15:19:59 | climateprediction.net | [sched_op] Server version 611
28.07.2010 15:19:59 | climateprediction.net | Message from project server: No work sent
28.07.2010 15:19:59 | climateprediction.net | Message from project server: (reached daily quota of 3 tasks)
28.07.2010 15:19:59 | climateprediction.net | Project requested delay of 38229 seconds
28.07.2010 15:19:59 | climateprediction.net | [sched_op] Deferring communication for 10 hr 37 min 9 sec
28.07.2010 15:19:59 | climateprediction.net | [sched_op] Reason: requested by project


This small log-snippet tells two things, CPDN haven't applied the change restoring scaling by #cpu's from 15.06, since if they had the minimum possible quota would be 8 for my computer. Also, the very long "Project requested delay of" 10 hours, 37 minutes and 9 secounds cleary shows CPDN still uses the old code of deferring computers when hitting daily quota until midnight server-time + randomly 1 hour. This code was removed 02. June 2010.

While CPDN haven't done a full server-upgrade with more resent code than 02.06.2010, hopefully they've atleast applied the other bug-fixes to the quota-system. If not, no wonder it's a total mess, since the quota-code as of 01. June included many bugs fixed in later code.


But anyway, until CPDN upgrades their scheduling-server, the max quota is:
4 WU per day * number of applications.
Since currently only Famous is available, this means max quota is 4 WU's per day, regardless of this being a shiny new 12-"core" i7-980 or an old single-core cpu.
77) Message boards : Number crunching : Message from project server: Invalid app version description (Message 40226)
Posted 23 Jul 2010 by Ingleside
Post:
Looks like he's put in a fix already: [trac]changeset:22052[/trac].

Yes, but unfortunately it looks like a server-upgrade is needed to not get the error-message from v6.11.x and later clients, and CPDN isn't the fastest to adding such upgrades...
78) Message boards : Number crunching : Message from project server: Invalid app version description (Message 40223)
Posted 23 Jul 2010 by Ingleside
Post:
If you are an alpha tester, report this behaviour to the BOINC Alpha email list.

Already made a report, and David Anderson is looking into the problem.
79) Message boards : Number crunching : Message from project server: Invalid app version description (Message 40211)
Posted 22 Jul 2010 by Ingleside
Post:
I'm finishing a CPDN HadSM slab that I downloaded months ago. I'm also finishing a CPDN FAMOUS v.6.10 which is not the newest FAMOUS version. So both are types that are now deprecated. But I see no messages like this in my Boinc manager.

Hmm, which BOINC-version are you using on this computer, is it v6.2.xx or v6.10.xx?

I'm not seeing it in v5.10.45, but does see it in v6.11.3...

As for FAMOUS, not sure, but wouldn't immediately expect you'll trigger any message as long as it's only an older version, but can't rule-it out at this point.

It will atleast be easier to test with SETI@home if it's due to deprecated application-version or only deprecated application, when they're up again late Friday...

Could members please tell us which models produce these invalid app messages?

Ingleside, as the message comes from Boinc I don't think CPDN can change it. But could this be this some extra weird aspect of the new CPDN Boinc server version?

It's a standard BOINC-message, but this doesn't really stop CPDN from customizing it. ;) But granted, the less customizations makes, the easier it is to upgrade...


Edit - it seems the message only shows-up for CPDN if you're running v6.11.x, since it doesn't show-up in v6.10.58.
80) Message boards : Number crunching : Message from project server: Invalid app version description (Message 40207)
Posted 22 Jul 2010 by Ingleside
Post:
Nothing is known about it here, but there is this thread.



Hmmm....

Title: Re: Weird BOINC server messages
Post by: Claggy on 05 Jul 2010, 06:40:01 pm
astropulse and astropulse_v5 are obsolete, (by at least a year) and have been replaced by astropulse_v505, they are the source of the 'Invalid app version'

If you grab the latest 0.36 Lunatics installer you can upgrade the apps to the latest versions.

Claggy


Am I running an obsolete CPDN work unit, or something silmilar?

I'll set to No New Work' and either Reset or Detach/Attach (if I remember), and see if the messages go away. This message is on CPDN only (so far as I have noticed).

There's been no problems over at SETI@home due to getting the "invalid app version", so wouldn't expect this is any problems for CPDN either.

Now, the Lunatics-explanation of this being due to users still having Astropulse and Astropulse_v5 listed then they're using the anonymous-application-mechanism is possibly correct, but I can't test if removing these 2 will have any effects until SETI-servers is up again late Friday...


As for CPDN getting this "Invalid app version"-message, assuming Astropulse is the reason for SETI, for CPDN the reason is that all applications except Famous is deprecated. Meaning, every time any CPDN-computer that has a non-Famous-model connects to CPDN, chances are you'll get the "Invalid app version"-message. When you've finished all these older models and reported them, I'll guess you'll not see this message any longer.

CPDN will still use the results from all Hadam3P and Slab-models and any other non-Famous-models you'll still running, so there's no reason for setting "No new work" or doing anything else.

So, for CPDN, instead of showing "Invalid app version", more accurate would probably be something along the lines of "We've stopped distributing new models for the model-type you're still running, but there's no problem to finish your current model."



Previous 20 · Next 20

©2024 climateprediction.net