climateprediction.net home page
Posts by Ingleside

Posts by Ingleside

31) Message boards : Number crunching : Reporting - Errors while computing - (Message 47632)
Posted 22 Nov 2013 by Ingleside
Post:
"Computing allowed"
1] while computer is in use
2] while processor usage is less than 0 percent

I'll change "Only after computer has been idle for" to 0 minutes, it was on 3.00mins.
{not entirely sure what this latter setting actually means or really remember why it was on 3.00}

You're not allowed to set "has been idle for" to zero minutes, even as has already been mentioned this setting isn't used if you don't suspend computing for any reason. While it's possible to manually edit the preference-file (either override or general) and set it to zero, if you do this the client-default is used instead, and this probably is 3 minutes.

Some other settings on the other hand does accept zero minutes, and also a little inconsistently zero percent as far as processor-usage means 100%.
32) Questions and Answers : Windows : C++ error continually occurs (Message 47581)
Posted 14 Nov 2013 by Ingleside
Post:
So was there any definitive fix for this problem? I've got it too.

In my experience running BOINC as a service will "fix" the problem, atleast as far as a dialogue popping-up and a cpu-core sits idle until you click on the message goes. Models can still crash with C++ error, and it's also possible a crash will just leave model running even after hitting 100%, but atleast you'll not get the popup-message any longer.

If you're also using your GPU for crunching on other projects, service-installation will unfortunately not be an option.
33) Questions and Answers : Windows : Windows 8.1, a caution... (Message 47409)
Posted 26 Oct 2013 by Ingleside
Post:
I found it impossible to install boinc in its own partition, as I've done since CPDN merged with boinc. Installation was also impossible as a 'service' unless all three options were accepted, including 'screensaver.'

Atleast this part of your problems is you're insisting on trying to run an ancient BOINC-client what never has been and never will be supported by Windows 8.x.

Windows 8.0 and MacOS "Mountain Lion" or later OS-Versions demands BOINC v7.0.xx or later BOINC-versions to work correctly.
34) Message boards : Number crunching : failed upload: can't resolve hostname (Message 47300)
Posted 12 Oct 2013 by Ingleside
Post:
It WAS tested and the results posted in that extinct thread.
I think that the person in question unpacked the zip/tar them self on arrival and before it had a chance to start, and manually checked the url.

Hmm, why someone would zip or tar (and feather) their sched_reply_climateprediction.net.xml escapes me, and if it was done to any of the many CPDN-files residing in the project-directory makes even lesser sence since the BOINC-client doesn't know (and doesn't care) if any of these files somehow does include an url.

But while my recollection was too fuzzy, it's an advantage I did take part in atleast one of the discussions myself and this was not done on the php-board.

This message from 12.04.2011 is the most interesting, clearly showing the client_state.xml was corrupt even before any of the files was downloaded while sched_reply* was not corrupt.

If the problem was CPDN-only on the other hand was never answered by the tester in the old thread...
35) Message boards : Number crunching : failed upload: can't resolve hostname (Message 47298)
Posted 12 Oct 2013 by Ingleside
Post:
cwhyl

This was discussed extensively on the old php board 2-3 years back when it first started happening. It was also tested a fair bit.

The files were/are OK on the server.
They're OK when they arrive zipped up on the user's computer.
At some point after unzipping and moving to their various locations, the data in the client_state.xml file shows up corrupted, in a couple of different ways.

So it's most likely a subtle bug in BOINC for a particular variety of Linux.

Uhm, maybe my recollection is too fuzzy, but I don't remember anyone with a corrupt upload-URL ever showing they did get a sched_reply_climateprediction.net.xml with the correct upload-URL and this was either wrongly inserted into client_state.xml or client_state.xml later getting corrupted.

Since CPDN doesn't try uploading before having trickled N times, sched_reply* has also been wiped-clean atleast N times. This is one of the reasons trying to pin-point why some is getting corrupt URL is so hard, and also why AFAIK server-problems as the source never was eliminiated.

36) Message boards : Number crunching : Compute Errors / Bad Work Units? (Message 47218)
Posted 30 Sep 2013 by Ingleside
Post:
Make sure you have "leave applications in memory when suspended" OFF.

For the majority of crunchers it's always better to have "leave applications in memory" ON, and for some BOINC-projects it's a good chance you'll have problems if it's not turned on.

For CPDN, especially if you're starting many models at once, there'll be large disk-trashing and chances are this increases the probability of something using "too long" and errors-out the model. As long as models is kept in memory, you'll not have this problem except after rebooting computer. So, if you're not really short on memory, or runs some really memory-hungry applications, it's better to keep applications in memory.
37) Message boards : Number crunching : failed upload: can't resolve hostname (Message 47111)
Posted 18 Sep 2013 by Ingleside
Post:
However... I think you have something like 2 weeks before the upload fails. So you can simply sit back & hopefully the project might make this same change at Rutherford.

It's 90 days, if you're not running an ancient BOINC-client like v6.2.xx or something even older.
38) Message boards : Number crunching : Download Errors: Permanent HTTP -- Euro Region Tasks (Message 46671)
Posted 22 Jul 2013 by Ingleside
Post:
It's all caused by a long period timer somewhere in the BOINC server code.
When the timer reaches the maximum value for that variable type, it overflows to zero, and BOINC thinks that it's a new data set, so it tries to issue it.

It's not an overflow, the BOINC server-code includes a security-measure in case the server somehow has overlooked a task. The security-measure kicks-in if a task hits 1.5 times it's deadline and this triggers a re-check verifying if wu is finished or if a new task is neccessary.

Since CPDN isn't archiving "done" wu and removing these from database like other BOINC-projects normally is doing, you'll continue having this problem with CPDN re-issuing ancient wu's until they're hitting any of their max-limits (max error/total).

If not mis-remembers with more resent server-code it's possible to disable the re-issue when hitting the security-limit.
39) Message boards : Number crunching : Download errors: Permanent HTTP error (PNW) (Message 46441)
Posted 18 Jun 2013 by Ingleside
Post:
Just got another of these PNW tasks, (a re-issue) where some of the files didn't download and a permanent HTTP error ensued. What I can't remember is whether or not I need to delete the folder with the model or whether BOINC will do it given time?

Having extra CPDN-folders is only a problem after a model has started. As long as one or more of the input-files is missing, the model never starts, and BOINC-client cleans-up on it's own.

Note, some of the input-files can be marked as "sticky"-files, in case they're used by multiple models. "Sticky"-files is not automatically deleted. Manually deleting such files won't work either, they'll just be tried re-downloaded next time client re-starts. Depending on client-version, they won't be removed by a reset either, but if not mistaken they will be removed on reset if you're running a fairly resent v7-client.
40) Message boards : Number crunching : Workunit error - check skipped (Message 46286)
Posted 24 May 2013 by Ingleside
Post:
(BOINC was designed on the assumption that two different computers processing a task would produce results that are identical, bit for bit. If this were true a simple comparison of results would be a useful check for correct data transmission. But climate models break that assumption.)

Not exactly, BOINC was designed on the assumption the projects would write their own validator, but did include two generic validators, one is the bit-by-bit comparison and the other is "everything validates". Example, the SETI-validator allows 1% variation between most signal-strengths, but at the same time demands the signals is at the same frequency.

Since the validator is project-specific, a CPDN-validator could example check if all trickle-files was reported, all files is uploaded and can also do some other checks on the results. A CPDN-validator doesn't need to compare to other results for wu, meaning no problem with different results.

By running validator & Assimilator, CPDN could also run db_purger, meaning wu's finished would be archieved and removed from database. One obvious advantage here is, finished wu's wouldn't spawn a re-send doomed to fail with download-error 1.5x after the deadline. Another advantage is the database would be kept smaller, and don't need to do the ocassional manual archieving often leading to problems as CPDN has been doing...
41) Message boards : Number crunching : Hyperthreading (Message 46244)
Posted 16 May 2013 by Ingleside
Post:
Haven't benchmarked resently, but did benchmark my i7-920 when it was new. It's running stock speed, 6 GB memory in triple-channel-mode and probably at 1066 MHz-memory-speed. Benchmarked on, if not mis-remembers, a Hadam3P-model, since AFAIK it was before the various regional models was released. In any case, the results are:

1 instance to 1st. trickle: 6237 seconds/trickle.
4 instances to 3rd. trickle and averaging: 8456 seconds/trickle.
8 instances to 3rd. trickle and averaging: 14002 seconds/trickle.

Meaning, running 4 instances it's performing as a 3-core-computer, while running 8 instances it's performing as a 3.5-core-computer. The advantage of running 8 over 4 instances was 21%.

The benchmarking was done without any turbo-boost (don't even remember if it has turbo at all...), and the same model was re-run from beginning to remove any effects from variable speed during a model and between models.


As for the i3770K not having the same HT-effect, my 1st. suggestion would be turbo-boost, but if you've disabled this... Another possibility is, the i7-920 is triple-channel, even at 1066 MHz it's got the same total bandwidth as the i3770K running at default dual-channel 1600 MHz-memory-speed. Since the cpu is also faster, it's possible the memory-bandwidth is saturated earlier than on the slower i7-920.
42) Message boards : Number crunching : Several jobs uploads in project backoff (Message 46153)
Posted 3 May 2013 by Ingleside
Post:
You have exploded an urban myth!

Well, some myths are easy to bust...


43) Message boards : Number crunching : Several jobs uploads in project backoff (Message 46151)
Posted 3 May 2013 by Ingleside
Post:
The time limit for uploading files from any project was extended. I can't remember whether the limit is now two or three months, but in any case it's far longer than we need.

It's 90 days.

But, but, but... each file is still only allowed 100 upload attempts, after which it expires. That's the BOINC rule. 100 is plenty but please don't use up the files' lives by repeatedly pressing the Retry now button in the Transfers tab. The files come to no harm while they wait.

I've never seen anything to a "100 upload attempts"-rule, and seeing how a file can easily reach this limit in 4 days (assuming re-tries once per hour), it wouldn't make any sence to increase the limit from 14-day to 90 days in this case.

To do a little test, blocked internet-connection and hit "retry" on a SIMAP-upload 110 times... no problem. Did a little editing, and, as BoincTask happily shows, it's now retried... 1234567 times, hits retry, 1234568 times, 1234569 times, 1234570 times, 1234571 times...

Since 1234567 >> 100 I didn't see anything to any 100-retry-limit on uploads...
44) Questions and Answers : Windows : Intel Visual Fortan run-time error (Message 45916)
Posted 13 Apr 2013 by Ingleside
Post:
For the "killer trickle" to be sent to the correct target, that target, i.e. climate model, needs to return a trickle_up file for the server to find it.
As has been said, this is unlikely to happen, so they CAN'T be killed from the server.

Aborting tasks without relying on trickle-messages has been part of BOINC since around BOINC-Client v5.10.x.
45) Message boards : Number crunching : Download Checksum Error (Message 45823)
Posted 6 Apr 2013 by Ingleside
Post:
Got atleast one old wu including white-space as part of the MD5 and v7.0.60 didn't give any MD5-errors, so looks like the fix works as it should.

While there wasn't any MD5-errors, the wu did still error-out, but this was due to one of the files missing from server. Copying the link to the web-browser also gave 404 Not Found, so this isn't a client-problem.
46) Message boards : Number crunching : Download Checksum Error (Message 45812)
Posted 5 Apr 2013 by Ingleside
Post:
Have only managed getting assigned one model since upgraded to v7.0.60, but atleast no download-errors this time. Wu is from December and is wherefore old.

Since have done another scheduler-request after getting the new model, can't check if the scheduler-reply included white space as part of the md5.


47) Message boards : Number crunching : Upload stops at 10MB with HTTP error (Message 43803)
Posted 15 Feb 2012 by Ingleside
Post:
Uploads can stack up indefinitely and not be lost, true?

Not completely indefinitely, if you're running an old BOINC-client the cut-off is 14 days from 1st. upload-try of each individual upload, while with v6.10.0 and later BOINC-clients the cut-off has been extended to 90 days.

48) Message boards : Number crunching : Project file upload handler is missing. (Message 43752)
Posted 5 Feb 2012 by Ingleside
Post:
Or so I thought,
I have just been through a copy of clientstate.xml without finding a misspelling of handler. I am still getting the message Project file upload handler is missing.
I have tracked down the zip file in question in the .xml file but am afraid it makes little sense to me, other than that the file is not transferring which I could tell by looking @ the transfers tab.

I am quite happy for someone to point out something in the portion of the file which is bleeding obvious, even if not to me.

Hmm, is it the computer running v6.13.10?

If so, the 1st. step is to upgrade to v7, since v6.13.10 has some major bugs, among others with handling of trickle-uploads.


If after the upgrade to v7 this file is still stuck without uploading, try changing the status-part from zero to 1, as shown below:

<upload_url>http://uploader1.atm.ox.ac.uk/cpdn_cgi/file_upload_handler</upload_url>
</file>
<file>
<name>hadam3p_eu_8h04_2000_1_007687181_0_4.zip</name>
<nbytes>13743083.000000</nbytes>
<max_nbytes>150000000.000000</max_nbytes>
<md5_cksum>41130e8681805ab6432ca43cea075478</md5_cksum>
<status>1</status>
<upload_url>http://uploader1.atm.ox.ac.uk/cpdn_cgi/file_upload_handler</upload_url>
<persistent_file_xfer>
<num_retries>14</num_retries>
<first_request_time>1327526537.473585</first_request_time>
<next_request_time>1328311850.488157</next_request_time>
<time_so_far>295.883478</time_so_far>
<last_bytes_xferred>0.000000</last_bytes_xferred>
<is_upload>1</is_upload>
</persistent_file_xfer>

Dave
49) Message boards : Number crunching : More FPU or Integer Power needed? (Message 43444)
Posted 20 Nov 2011 by Ingleside
Post:
My recommendation was based on what we have seen on cpdn. A Core i7 920 (hardly a high end processor) can beat a higher priced Phenom II X6 1100T in total throughput of models.

Hmm, in my experience the i7-920 and X6-1090T performs similarly, but the X6-1090T has decidedly a big advantage then it comes to cost of buying. Just for the cpu the i7-920 was 33 % more expensive than the X6-1090T, and also other things like mainboard and memory was more expensive for i7-920 than for Amd.

Now, atleast around here no-one has sold any i7-920 for the last year atleast, but let's look on the current entry-level offering from intel among the i7-cpu's, the i7-2600K. The i7-2600K is 44 % more expensive than an AMD's X6 1100T. No idea how fast the i7-2600K is, but would still guess it's less than 40 % faster...

A hex-core intel-cpu should be faster, but is also much more expensive. Also, running multiple memory-hungry CPDN-models will have an impact on performance, so it's not certain a hex-core gives much higher performance than a quad-core does.
50) Message boards : Number crunching : Set no gpu option (Message 43177)
Posted 8 Oct 2011 by Ingleside
Post:
I don't think that the server version is sufficiently up to date for the "no gpu" code.

CC_config can apparently be set on user's computers to do the same thing.

Unfortunately in v6.12.xx and earlier clients, users can't disable individual projects from using GPU's, the available cc_config-options will disable GPU's for all projects. Seeing that MarkJ runs GPUGRID and Byron runs SETI@home, neither of them would want to disable GPU-crunching on their computers.

Disabling GPU-crunching for individual projects is an option in v6.13.x, but these clients is alpha-clients with many rough edges, so for anyone not being alpha-testers it's recommended to stay away from these clients for now. Atleast from the look of things, GPUGRID doesn't currently work with v6.13.x-clients, since it's one of the many projects that haven't disabled upload-certificates yet.


Previous 20 · Next 20

©2024 climateprediction.net