climateprediction.net home page
Posts by Thunder

Posts by Thunder

21) Questions and Answers : Windows : error code -161 on suphur cycle WU, now all work fails (Message 17092)
Posted 9 Nov 2005 by Thunder
Post:
The -161 errors are not the real problem. They indicate that the application has finished running and BOINC is attempting to upload result files that haven\'t been created.

The -161 errors are masking the real problem, which might be revealed by looking at the stdoutdae.txt and stderrdae.txt files in the BOINC directory.


No Thyme, I\'m afraid I must disagree... there are no other errors in either of those files that indicate anything beyond exactly what appears in the result that I linked to. Just in case, I\'m copy/pasting (replacing the \'evil characters that won\'t display\' with [ and so forth) the exact output here:

From stderrdae:

2005-11-07 21:37:48 [climateprediction.net] Unrecoverable error for result sulphur_480b_000297275_0 ([file_xfer_error]
[file_name]sulphur_480b_000297275_0_1.zip[/file_name]
[error_code]-161[/error_code]
[error_message][/error_message]
[/file_xfer_error]
[file_xfer_error]
[file_name]sulphur_480b_000297275_0_2.zip[/file_name]
[error_code]-161[/error_code]
[error_message][/error_message]
[/file_xfer_error]
[file_xfer_error]
[file_name]sulphur_480b_000297275_0_3.zip[/file_name]
[error_code]-161[/error_code]
[error_message][/error_message]
[/file_xfer_error]
[file_xfer_error]
[file_name]sulphur_480b_000297275_0_4.zip[/file_name]
[error_code]-161[/error_code]
[error_message][/error_message]
[/file_xfer_error]
[file_xfer_error]
[file_name]sulphur_480b_000297275_0_5.zip[/file_name]
[error_code]-161[/error_code]
[error_message][/error_message]
[/file_xfer_error]
)


and from stdoutdae:

2005-11-07 21:37:48 [climateprediction.net] Unrecoverable error for result sulphur_480b_000297275_0 ([file_xfer_error]
[file_name]sulphur_480b_000297275_0_1.zip[/file_name]
[error_code]-161[/error_code]
[error_message][/error_message]
[/file_xfer_error]
[file_xfer_error]
[file_name]sulphur_480b_000297275_0_2.zip[/file_name]
[error_code]-161[/error_code]
[error_message][/error_message]
[/file_xfer_error]
[file_xfer_error]
[file_name]sulphur_480b_000297275_0_3.zip[/file_name]
[error_code]-161[/error_code]
[error_message][/error_message]
[/file_xfer_error]
[file_xfer_error]
[file_name]sulphur_480b_000297275_0_4.zip[/file_name]
[error_code]-161[/error_code]
[error_message][/error_message]
[/file_xfer_error]
[file_xfer_error]
[file_name]sulphur_480b_000297275_0_5.zip[/file_name]
[error_code]-161[/error_code]
[error_message][/error_message]
[/file_xfer_error]
)


I\'ll swear on as big a stack of bibles as you\'d care to put before me that there is absolutely nothing immediately before, nor after those errors that indicate anything other than the normal operation of the client (pausing, switching, downloading, uploading stuff, etc.)
22) Questions and Answers : Windows : error code -161 on suphur cycle WU, now all work fails (Message 17090)
Posted 9 Nov 2005 by Thunder
Post:
Your original post mentioned 4.45
Assuming that this the BOINC version, there is a bug in it.


Thank you for the reply Les, but I\'ve been using the custom compliled 4.45b client since about the 3rd time I had my clients cease processing CPDN due to the \'failure to exit\' benchmark bug, so I\'m sure that\'s not the problem.

I\'m just now opening and looking at the files that Thyme suggested to see what I can find there.
23) Questions and Answers : Windows : error code -161 on suphur cycle WU, now all work fails (Message 17066)
Posted 8 Nov 2005 by Thunder
Post:
Anyone yet have any idea what these -161 errors are or mean?

I just had another one fail with it:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=1136194
24) Questions and Answers : Windows : error code -161 on suphur cycle WU, now all work fails (Message 16606)
Posted 14 Oct 2005 by Thunder
Post:
Well, now another WU just failed with error code -5 (0xfffffffb).

I\'m going to go nuts if I have to try to figure out the meaning of another error code so I\'m pulling this one off CPDN as well. (Now the 5th machine of 9 that I\'ve had to pull off CPDN because it just won\'t run on them).

The really weird part is that the only machines I have that seem to run CPDN reliably are the 2 that I cobbled together myself. Every single stock PC that I use from Compaq/HP and IBM all error more than they work reliably. All of them run every other BOINC project perfectly and all complete every stress test I throw at them perfectly (Memtest86+, SuperPi and Prime95 torture tests), but they just can\'t seem to do CPDN. :(
25) Questions and Answers : Windows : No work? (Message 16585)
Posted 13 Oct 2005 by Thunder
Post:


Thanks for your help, Thunder. Looks like I\'ll have to reset then. It hasn\'t been working for a couple of days, so it seems unlikely to get back on track...

Cheers,
Gregor


Anytime... Usually I\'m the one coming here desperately LOOKING for help. Glad I could finally answer something for someone.

Looks like you downloaded a regular slab unit yesterday, and it\'s not errored yet. Hope you\'re back on track now. :)
26) Questions and Answers : Windows : No work? (Message 16552)
Posted 11 Oct 2005 by Thunder
Post:
Gregor,

It would appear that your computer had an error on both of the workunits that it downloaded yesterday.

CPDN limits each machine to downloading no more than 2 per day (to prevent computers that are having problems from downloading and \"tossing out\" dozens or perhaps even hundreds of model runs).

The first one appears to be the -161 error (that I am also having trouble with on one machine). I can\'t help you with the solution for that, since I don\'t think even the \'experts\' know what\'s causing it.

The second one gave \'exit code -2\' which has also happened to me after another WU errors out with a \'-161\'.

I had to reset the project (or detach/reattach) to stop it from happening continually. You may want to wait and see what happens with the next WU that it downloads, since resetting will cause it to throw out the other WU that you\'re working on.
27) Questions and Answers : Windows : error code -161 on suphur cycle WU, now all work fails (Message 16534)
Posted 10 Oct 2005 by Thunder
Post:
I\'ll check and see if there are any newer BIOS updates beyond what it came with tomorrow.


The only updates beyond what I have are only recommended for some specific hardware issues (none of which apply to this system) and the recommendation is to leave the BIOS I have on the system. Since I take a definite approach of \'if it ain\'t broke, don\'t fix it\', I\'m leaving that alone.

I now see that there are posts over on the other message boards (the CPDN classic ones that I can\'t seem to ever get my account to work on, so I can\'t post to them) about many, many others experiencing this -161 error with suphur_cycles, so I\'m filing this away in the catagory of \"CPDN\'s problem, not mine\".
28) Questions and Answers : Windows : Why would a model skip Phase 3 completely? (Message 16527)
Posted 10 Oct 2005 by Thunder
Post:
I checked the log on BOINC this morning. There are no errors.... it simply thinks it completed the entire phase3 in about 1 minute and 2 seconds.

It uploaded the zip files and moved on to a sulphur_cycle WU (that unfortunately it doesn\'t have a snowball\'s chance in hell of finishing before deadline, but oh well...).
29) Questions and Answers : Windows : error code -161 on suphur cycle WU, now all work fails (Message 16517)
Posted 10 Oct 2005 by Thunder
Post:
I had -1073741819 errors for awhile on one PC. Updated the BIOS and they went away. The BIOS update supposedly fixed some memory incompability...even though Prime95 and memtest ran fine.


I\'ll check and see if there are any newer BIOS updates beyond what it came with tomorrow.
30) Questions and Answers : Windows : error code -161 on suphur cycle WU, now all work fails (Message 16514)
Posted 10 Oct 2005 by Thunder
Post:
Sorry, I\'m not able to help you with the 161 error, but it doesn\'t look like an end of phase error as you were 14 trickles into the phase, and the original failure where you were 16 trickles into the phase.


Hrmm.... true.

Well, I\'ll make one more check tomorrow to make sure the hardware is working fine, but I checked it with memtest86 and SuperPi after the last failed WU and both completed without so much as a hiccup. Scandisk found no problems, etc.

This is a stock HP/Compaq business desktop... no overclock, nothing but factory original and it\'s only a few months old. This thing runs stable as a rock on anything but CPDN, it seems. :(
31) Questions and Answers : Windows : error code -161 on suphur cycle WU, now all work fails (Message 16511)
Posted 9 Oct 2005 by Thunder
Post:
Well, this is frustrating....

The same computer just got to the end of phase 1 of another sulpher cycle unit and had the exact same error.

Here\'s the result:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=1102562

I can\'t find any reference to what an error -161 is anywhere, but it\'s evidently going to keep happening. :(

Any ideas before I have to take yet another machine off CPDN because it just won\'t run correctly?
32) Questions and Answers : Windows : Why would a model skip Phase 3 completely? (Message 16506)
Posted 9 Oct 2005 by Thunder
Post:
I just checked the site to see what machines have trickled in the last couple days and noticed this workunit:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=714025

I\'ve not looked at the BOINC log itself since the computer is currently sitting in my office and as much as I love CPDN, I\'m not dragging myself in on a Sunday, but I can see some info from the site.

I knew it was close to finishing Phase 2 and it did this morning, but then 2 minutes later it also said it finished Phase 3 (my, that was a quick one) and according to the site, the result is \'Over\' with an outcome of \'Success\'. This immediately gave me the full 6,805.26 cobblestones of credit for the WU.

Oddly enough, the workunit states \'Too many total results\' though it\'s only been sent twice in total.

Not that I don\'t appreciate the >2k extra credits in a day, but it hardly seems fair. ;) Thought I should ask that someone look into it. :)
33) Questions and Answers : Windows : error code -161 on suphur cycle WU, now all work fails (Message 16246)
Posted 25 Sep 2005 by Thunder
Post:
The secret to posting these errors is to paste, and then edit to change the arrow brackets to square brackets.


Okay, thanks, I\'ll make a point of using a search/replace in a text editor before I post \'em. :)

The only person I know who has fixed a problem with the error code 1073741819 is <a href=\"http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=3229\"> here.</a> And in his case it appears to have been a graphics card problem.


Unfortunately, I\'m sure that\'s not my case. I\'m running BOINC as a service and never display the graphics (screensaver disabled, managed from boincview on another machine)


Have a look in client_state.xml for the model name, and find where the zips are mentioned.
Just before each one is a line about the destination uploader. If it\'s Bern, it will be something like: unib.ch, in which case you\'ll just have to wait until they are back up again.



Okay, I suppose this will be informative as a \'post-mortem\', but the client has terminated/abandoned the run, so it\'s too late to do anything about it now.

I was depressed enough over yet another failed run (my 45th) that I stopped to add up how much processing time was devoted to runs that failed for some reason. 36,032,432 seconds or 417 cpu/days of time that didn\'t produce anything useful. :P I also have 14,189,264.00 seconds towards 2 results that appear to be \'unknown\' though, I think they completed. Thankfully I do at least have 82,252,941 seconds (952 cpu/days) that produced 13 complete runs at least!

\'Tis a bit frustrating to think that 4/5ths of my models are going to error out and therefore nearly 1/3rd of my computing time will be wasted. :\\
34) Questions and Answers : Windows : error code -161 on suphur cycle WU, now all work fails (Message 16244)
Posted 25 Sep 2005 by Thunder
Post:
I got this error from this WU:

BAD WU

Edit: I\'ve tried every combination of bbcode tags I can find to get the error code to display. I can\'t quote it and have it display. Please look at the WU itself... I\'m on my 9th edit of the post and am giving up. :P


<core_client_version>4.45</core_client_version>
<message><file_xfer_error>
  <file_name>46ii_100295338_1_1.zip</file_name>
  <error_code>-161</error_code>
  <error_message></error_message>
</file_xfer_error>
<file_xfer_error>
  <file_name>46ii_100295338_1_2.zip</file_name>
  <error_code>-161</error_code>
  <error_message></error_message>
</file_xfer_error>
<file_xfer_error>
  <file_name>46ii_100295338_1_3.zip</file_name>
  <error_code>-161</error_code>
  <error_message></error_message>
</file_xfer_error>
<file_xfer_error>
  <file_name>46ii_100295338_1_4.zip</file_name>
  <error_code>-161</error_code>
  <error_message></error_message>
</file_xfer_error>
<file_xfer_error>
  <file_name>46ii_100295338_1_5.zip</file_name>
  <error_code>-161</error_code>
  <error_message></error_message>
</file_xfer_error>

</message>



and now all further WU\'s error out with:

<core_client_version>4.45</core_client_version>
<message> - exit code -1073741819 (0xc0000005)
</message>

I\'ve tried restarting BOINC and then resetting CPDN on this machine, but I won\'t know for about 10 hours (when communciation is no longer deferred) if it made any difference).

Obviously, the BOINC version, you all can see. The installation is running Einstein@Home as well. I\'m confident the computer is stable (not overclocked, etc.) and well maintained.

Any idea what I can do about this? (I know there are lots of other posts that appear to relate to this problem, but none seemed exactly the same and none seemed to offer any solution or ideas)
35) Questions and Answers : Windows : exiting project after request of benchmarks (Message 15998)
Posted 14 Sep 2005 by Thunder
Post:
The benchmark runs by itself every 5 days so that is why you see it when you do. Use 4.45b which Thyme linked to. I\'ve had no problems since I\'ve been using it, as opposed to problems every automatic benchmark when I used 4.45 \"official\".


Ahhh, thanks for the info. Slowly but surely I\'m getting my head around all the little subtle things this program does.

I believe I will have to get the custom compiled version, because I came in to find that 2 of 3 of the machines at my office that run BOINC as a service had also stopped. (There goes that theory out the window)

At least thanks to the wonders of remote management, it only took 30 sec to get into both and restart the BOINC service. :)
36) Questions and Answers : Windows : exiting project after request of benchmarks (Message 15994)
Posted 14 Sep 2005 by Thunder
Post:
The BOINC development team are well aware of the problem (Chris Sutton and myself have made sure of that!)


I just discovered the same problem on 2 of my 4.45 machines.

I\'m not sure if this helps or not, but I\'ve been noticing it every 4-5 days on these machines, but for over 2 weeks, all of mine that run BOINC as a service have had no problem.

Since it\'s only been 2 weeks or so that I\'ve been closely monitoring the issue, this may be purely anecdotal evidence, but I\'ll keep an eye on it.
37) Questions and Answers : Windows : Hmm, I think I got more credit than I deserve (Message 4115)
Posted 14 Sep 2004 by Thunder
Post:
&gt; yes its possible the trickle got sent again. but you don't get additional
&gt; credit.
&gt; Did the math and it works out correct

Eeeep, Tolu, you're making me feel bad! Its obvious from my question that I was too darn lazy to add up the credits myself and then you went and did it. ;)

Ah well, thanks for letting me know for sure! :)
38) Questions and Answers : Windows : Hmm, I think I got more credit than I deserve (Message 4102)
Posted 14 Sep 2004 by Thunder
Post:
&gt; Checkpointing is done every 144 timesteps (3 model days) and they don't
&gt; coincide with the 10802 timestep trickle points.

Okay then. :) Just wanted to make sure on that.

Thanks for the quick response, Thyme. :)

Makes me feel sorry for folks that have to reboot or turn their computers off more often than I... they must lose a lot of time over the span of a CPDN run. :(
39) Questions and Answers : Windows : Hmm, I think I got more credit than I deserve (Message 4098)
Posted 14 Sep 2004 by Thunder
Post:
I had to reboot one of my computers after installing a driver and it happened to be just shortly after a trickle had been submitted. Apparently when it returned to working on the run again, it crossed that \'trickle point\' again.

The host ID is hostid=12129

Here\'s the trickles:

Time Sent (UTC) Host ID Result ID Result Name Phase Timestep CPU Time (sec) Average (sec/TS)
14 Sep 2004 01:56:59 hidden 40136 04lp_000030961_0 1 162030 414263 2.5567
14 Sep 2004 00:07:00 hidden 40136 04lp_000030961_0 1 162030 414250 2.5566

I took the time to \'exit\' the BOINC client rather than having the OS call for it to shut down, since I was under the impression that this would assure that the run would save at the point it was at. Looks like it instead went back to a checkpoint that was just prior to the trickle.
40) Questions and Answers : Windows : Where are the visualizations for Boinc? (Message 3530)
Posted 8 Sep 2004 by Thunder
Post:
I'm not a user of the original CPDN, but I think I understand what you're asking (not positive though).

To see the graphic display of what the model is doing, just right-click the WU from the 'Work' tab and choose 'Show Graphics'.

Hope that helps. :)


Previous 20 · Next 20

©2024 climateprediction.net