Questions and Answers :
Windows :
Benchmak stopped model being crunched
Message board moderation
Author | Message |
---|---|
Send message Joined: 6 Aug 04 Posts: 58 Credit: 1,286,603 RAC: 0 |
Just noticed the following in the log of one of my machines running hadsm3 4.13 25/07/2005 07:26:37 128 Suspending computation and network activity - running CPU benchmarks 25/07/2005 07:26:37 129 Pausing result 13pm_100072003_1 (removed from memory) 25/07/2005 07:26:39 130 Running CPU benchmarks 25/07/2005 07:26:47 131 Aborting CPU benchmarks, one or more active tasks are still running. I remember this happening once before on an earlier model and, I _think_, a different machine. The problem is that BOINCVIEW, and I suppose BOINC Manager?, show the model as running although it was never actually restarted after the benchmark abort. If you\'re not paying attention it\'s quite easy to miss the situation and end up with a machine sitting idle. Is this a known problem? Ian <img> |
Send message Joined: 17 Aug 04 Posts: 753 Credit: 9,804,700 RAC: 0 |
See <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=2893">this thread</a>. It is, as you say, easily overlooked. Only a few people have reported it, but it may be common. |
Send message Joined: 28 Aug 04 Posts: 65 Credit: 9,636,280 RAC: 0 |
It is occurring on a number of my machines.Some of these are "headless" (no monitor / keyboard or mouse) and I don't notice it until I check the CPDN stats and see that a machine has not reported in for some time). I am running CDPN and Seti under Boinc 4.45. I have both apps configured to remain in memory when they are suspended. When the event occurrs - both applications still show in the Boinc manager but CPDN does not appear to be running. Not sure if the Seti app still runs but I will watch for it. |
Send message Joined: 28 Aug 04 Posts: 65 Credit: 9,636,280 RAC: 0 |
Ok .. it just ocurred on one of my machines: - CPU benchmark starts to run and CPDN is paused. - an error occurs - the message "Aborting CPU benchmarks, one or more active tasks are still running." is displayed - both of the hadsm processes appear to be killed but BOINC does not seem to be aware of it. I am running Seti (10%) and CPDN (90%) and have set both to remain in memory when suspended. - on the "work" tab BOINC still shows CPDN and SETI. - BOINC still allocates time slices to CPDN even though the hadsm processes are not running. BOINC shows CPDN as running even though the hadsm processes are not actually there. - When BOINC allocates time slices to SETI and SETI runs as the SETI process is still in memory. |
Send message Joined: 5 Aug 04 Posts: 426 Credit: 2,426,069 RAC: 0 |
This is a known issue that is hopefully fixed in the 4.71 version of the BOINC client. The timeout for stopping applications for the benchmark run has been increased, hopefully enough to allow CPDN to stop in time. What is apparently happening is the wait times out, then CPDN stops, at this point the app is waiting for the benchmarks to finish and the benchmarks have allready given up. So neither does anything. Restarting the client will get things going again. Running the benchmarks manually every 4.5 days is a preventative. John Keck -- BOINCing since 2002/12/08 -- <a href="http://www.boinc.dk/index.php?page=user_statistics&project=cpdn&userid=191"><img border="0" height="80" src="http://191.cpdn.sig.boinc.dk?188"></a> |
Send message Joined: 2 Sep 04 Posts: 44 Credit: 372,682 RAC: 0 |
> The timeout for stopping applications for the benchmark run has been > increased, hopefully enough to allow CPDN to stop in time. What is apparently In the latest dev branch (4.72), the timeout is still at 10 seconds. Unless someone is still planning to increase it, it's not likely to be in the next release. |
Send message Joined: 16 Oct 04 Posts: 692 Credit: 277,679 RAC: 0 |
Chris Sutton has compiled a version 4.45a with a 30 sec delay rather than 10 sec. Arnaud has offered to host it so hopefully something soon. _______________________________ Visit <a href="http://boinc-doc.net/boinc-wiki/index.php?title=Climateprediction_FAQ">BOINC WIKI</a> for help And join <a href="http://www.boincsynergy.com/">BOINC Synergy</a> for all the news in one place. |
Send message Joined: 5 Aug 04 Posts: 2 Credit: 142,931 RAC: 0 |
> > It is, as you say, easily overlooked. Only a few people have reported it, but > it may be common. > FYI, I also observed same problem (CC 4.45 / Windows). I first assumed this was coming from my config, but I now discover it is a frequently met problem. By the way, if it's just a matter of changing the value of a delay, does anyone knows why Berkeley don't just do the modification for the next releases ? > Unless someone is still planning to increase it, it's not likely > to be in the next release. Shall we do a petition ? Where do we sign up ? ;-) |
Send message Joined: 16 Oct 04 Posts: 692 Credit: 277,679 RAC: 0 |
To get 4.45a with a 30 sec delay instead of 10, Arnaud is hosting a version created by Chris Sutton: <a href="http://arnaudboinc.free.fr/">4.45a site</a> _______________________________ Visit <a href="http://boinc-doc.net/boinc-wiki/index.php?title=Climateprediction_FAQ">BOINC WIKI</a> for help And join <a href="http://www.boincsynergy.com/">BOINC Synergy</a> for all the news in one place. |
Send message Joined: 2 Sep 04 Posts: 44 Credit: 372,682 RAC: 0 |
> To get 4.45a with a 30 sec delay instead of 10, There is a problem with the benchmarks run by this version. They are very low. I haven't yet figured out the problem (hoping just missing optimizations), so please don't download this version. If you already have, I would suggest reverting back to the UCB 4.45 until I have had a chance to figure out the problem. Sorry all. Chris :( |
Send message Joined: 2 Sep 04 Posts: 44 Credit: 372,682 RAC: 0 |
> There is a problem with the benchmarks run by this version. They are very low. There's a new version (4.45b) on its way to Arnaud. Benchmark issue appears sorted. Holler if you find otherwise. |
Send message Joined: 3 Sep 04 Posts: 268 Credit: 256,045 RAC: 0 |
<a href="http://arnaudboinc.free.fr">BOINC 4.45b</a> available :o) ----------------------------------------------- <a href="http://boinc-doc.net/boinc-wiki/index.php?title=Main_Page">Boinc Wiki</a> <a href="http://forum.boinc.fr/">L'Alliance Francophone</a> |
Send message Joined: 8 Sep 04 Posts: 23 Credit: 121,446 RAC: 0 |
I have a concern regarding the timeout, what if it's not that the CPDN model is taking more than 10s to exit but that your running a "non CPU intensive" project app (berkeley computer science "crash collection" project for example - http://winerror.cs.berkeley.edu/crashcollection/) as i'm using a 3GHz Xeon generation workstation, so i don't think it's CPDN that's causing the problem, possibly something is wrong with the "restart apps" code in the core client ??? i'm not a coder, so this is out of my league, just a suggestion though Lee |
Send message Joined: 7 Aug 04 Posts: 2180 Credit: 64,766,246 RAC: 653 |
> I have a concern regarding the timeout, what if it's not that the CPDN model > is taking more than 10s to exit but that your running a "non CPU intensive" > project app (berkeley computer science "crash collection" project for example > - http://winerror.cs.berkeley.edu/crashcollection/) as i'm using a 3GHz Xeon > generation workstation, so i don't think it's CPDN that's causing the problem, > possibly something is wrong with the "restart apps" code in the core client Could be, but every unattended benchmark with 4.45 resulted in a stoppage of work that wouldn't restart. Once I copied in Ralic's 4.45b files, all automatic benchmarks have worked as they should. |
Send message Joined: 8 Sep 04 Posts: 23 Credit: 121,446 RAC: 0 |
> > I have a concern regarding the timeout, what if it's not that the CPDN > model > > is taking more than 10s to exit but that your running a "non CPU > intensive" > > project app (berkeley computer science "crash collection" project for > example > > - http://winerror.cs.berkeley.edu/crashcollection/) as i'm using a 3GHz > Xeon > > generation workstation, so i don't think it's CPDN that's causing the > problem, > > possibly something is wrong with the "restart apps" code in the core > client > > Could be, but every unattended benchmark with 4.45 resulted in a stoppage of > work that wouldn't restart. Once I copied in Ralic's 4.45b files, all > automatic benchmarks have worked as they should. > true, but i gather from all the post on various threads that it's most probably because CPDN isn't allowed enough time to exit properly, hence the current problem, but a "non CPU intensive" task doesn't try to exit (because it won't affect the bemchmarks) and so the problem may well persist even with the increased timeout in 4.45b I'm just trying to get the the bottom of a potentially bigger problem, so that when boinc becomes more popular etc. there are fewer issues Lee |
Send message Joined: 31 Aug 04 Posts: 14 Credit: 404,382 RAC: 0 |
> |
©2024 cpdn.org