New work Discussion

Author	Message
Bryn Mawr Send message Joined: 28 Jul 19 Posts: 148 Credit: 12,830,559 RAC: 228	Message 64759 - Posted: 2 Nov 2021, 1:46:46 UTC - in response to Message 64756. Last modified: 2 Nov 2021, 1:47:09 UTC This project could easily do ten or twenty times as much work if they'd just make some improvements. Only if it had ten or twenty times as many researchers asking Oxford to send work out for them. Ok. This project could easily do ten or twenty times as much work per unit time if they'd just make some improvements. The main problem with that is not owning the source code - they’re not allowed to make changes to most of it. ID: 64759 ·

Bill F Send message Joined: 17 Jan 09 Posts: 124 Credit: 2,017,070 RAC: 4,681	Message 64760 - Posted: 2 Nov 2021, 13:41:21 UTC - in response to Message 64758. It's amazing how much math my generation can do in our heads compared to kids today that need a calculator to do the most rudimentry arithmetic. At my engineering school, order-of-magnitude calculations were emphasized, to catch the mistakes that people did with more precise methods. Also, it gave you a greater physical feel for the subject matter. I think many political mistakes are made by people who have not the slightest idea of the magnitude of what they are talking about. Pretty profound, but it rings with truth. Bill F ID: 64760 ·

Aurum Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318	Message 64761 - Posted: 2 Nov 2021, 20:55:05 UTC - in response to Message 64759. This project could easily do ten or twenty times as much work if they'd just make some improvements. Only if it had ten or twenty times as many researchers asking Oxford to send work out for them. Ok. This project could easily do ten or twenty times as much work per unit time if they'd just make some improvements. The main problem with that is not owning the source code - they’re not allowed to make changes to most of it. I assume the UK MetOffice owns the code. Or is it someone else? The biggest problem I see is the CPU cache congestion problem. Running too many WUs on a computer slows it down to a snail's pace. I keep playing around trying to figure out the most CP work units I can run on a computer. I've tried disabling hyperthreading and that works better but I still can't run all CPUs because it still slows down. Besides if I can't run every CPU thread with CP then I'd like to support ARP etc. Right now as my older WUs complete I detach from CP and then reattach to sweep up the debris it leaves behind. Then I specify a max of two CPUs and under BOINC preferences use at most 33/36=92%. That leaves some headroom but it's still noticeably faster if I run only one CP WU. It's frustrating when I know I could be running 18 or more if not for the CPU Congestion Issue. Last time I suggested this someone said they'd have to rewrite a million lines of Fortran. I'm not a coder but I would think they'd only need to modify aspects of the code. https://www.ibm.com/docs/en/aix/7.2?topic=implementation-design-coding-effective-use-caches "Repackaging techniques can yield significant improvements without recoding..." https://hackernoon.com/programming-how-to-improve-application-performance-by-understanding-the-cpu-cache-levels-df0e87b70c90 This guy says his code ran 50x faster after optimizing for CPU cache usage. I've even seen a book dedicated to efficient CPU cache coding. ID: 64761 ·

Aurum Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318	Message 64762 - Posted: 2 Nov 2021, 21:19:31 UTC - in response to Message 64757. What improvements do you have in mind? Nothing even comes close to fixing the CPU cache issue but a few upgrades could make this project a whole lot more user-friendly. I'd start by fixing the work delivery bugs. Several projects use the "Preferences for this project" page to allow the BOINCer to specify how many WUs of which project they'd like to download and maintain on their computer. Also fix the perpetual 60-minute project backoff. It makes no sense how work is delivered, it's just feast or famine. I either go days or weeks getting no WUs on a particular computer, even though the Server Status page says there's work available and another computer is getting work, or I get a year's worth of work in one delivery and must abort almost all of it. I can't think of another BOINC project that behaves this way. 16946 climateprediction.net 11/2/2021 2:14:19 PM update requested by user 16950 climateprediction.net 11/2/2021 2:14:25 PM Sending scheduler request: Requested by user. 16951 climateprediction.net 11/2/2021 2:14:25 PM Not requesting tasks: don't need (CPU: ; NVIDIA GPU: ) 16952 climateprediction.net 11/2/2021 2:14:27 PM Scheduler request completed 16953 climateprediction.net 11/2/2021 2:14:27 PM Project requested delay of 3636 seconds "Don't need" is not true. I have one 921 WU running and would like to run another. If I do get lucky and I'm blessed with a second WU I'd switch to "No new work" and switch back after one completed. Then if it's at all possible make the checkpoints closer together. ID: 64762 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4532 Credit: 18,835,737 RAC: 21,348	Message 64763 - Posted: 2 Nov 2021, 21:43:15 UTC - in response to Message 64762. or I get a year's worth of work in one delivery and must abort almost all of it. I have never received close to even six months of work even when work cache set to maximum. "Preferences for this project" page to allow the BOINCer to specify how many WUs of which project they'd like to download and maintain on their computer. In the past, CPDN used to allow users to specify which types of task they could receive, N216, N144 etc though this was before those particular types of task made it onto the drawing board but you get what I mean. I and at least one or two of the other moderators would like this but we have been told it isn't going to be changed, at least in the short term. I assume, I have never had some of the scheduling problems you have because I only run projects other than CPDN when there is no work available here. Windows tasks all get snapped up within a couple of days of appearing or even less, so on that front the only way more work can be done is for more scientists who want to do the areas of research that is suited to that task type. ID: 64763 ·

Alan K Send message Joined: 22 Feb 06 Posts: 490 Credit: 30,855,661 RAC: 12,752	Message 64764 - Posted: 2 Nov 2021, 23:42:51 UTC - in response to Message 64762. Then if it's at all possible make the checkpoints closer together. In the computing preferences menu item in "Options" there is a box :-checkpoint at most every.... seconds". ID: 64764 ·

Bryn Mawr Send message Joined: 28 Jul 19 Posts: 148 Credit: 12,830,559 RAC: 228	Message 64765 - Posted: 3 Nov 2021, 3:46:50 UTC - in response to Message 64761. This project could easily do ten or twenty times as much work if they'd just make some improvements. Only if it had ten or twenty times as many researchers asking Oxford to send work out for them. Ok. This project could easily do ten or twenty times as much work per unit time if they'd just make some improvements. The main problem with that is not owning the source code - they’re not allowed to make changes to most of it. I assume the UK MetOffice owns the code. Or is it someone else? The biggest problem I see is the CPU cache congestion problem. Running too many WUs on a computer slows it down to a snail's pace. I keep playing around trying to figure out the most CP work units I can run on a computer. I've tried disabling hyperthreading and that works better but I still can't run all CPUs because it still slows down. Besides if I can't run every CPU thread with CP then I'd like to support ARP etc. Right now as my older WUs complete I detach from CP and then reattach to sweep up the debris it leaves behind. Then I specify a max of two CPUs and under BOINC preferences use at most 33/36=92%. That leaves some headroom but it's still noticeably faster if I run only one CP WU. It's frustrating when I know I could be running 18 or more if not for the CPU Congestion Issue. Last time I suggested this someone said they'd have to rewrite a million lines of Fortran. I'm not a coder but I would think they'd only need to modify aspects of the code. https://www.ibm.com/docs/en/aix/7.2?topic=implementation-design-coding-effective-use-caches "Repackaging techniques can yield significant improvements without recoding..." https://hackernoon.com/programming-how-to-improve-application-performance-by-understanding-the-cpu-cache-levels-df0e87b70c90 This guy says his code ran 50x faster after optimizing for CPU cache usage. I've even seen a book dedicated to efficient CPU cache coding. Yes, it’s the Met office, not. CPDN or Boinc or the researchers we are helping. The Met Office have no involvement in what we are doing and optimise their code to run on their mainframes. The licence we are using to run the code does not allow us to change it to suit our PCs ID: 64765 ·

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 64766 - Posted: 3 Nov 2021, 5:37:11 UTC And the researchers are well aware that these models take a long time to run. This "BOINC stuff" is only a small part of the research, more "a special treat", rather than "the main course(s)". ID: 64766 ·

Harri Liljeroos Send message Joined: 9 Dec 05 Posts: 116 Credit: 12,530,539 RAC: 2,428	Message 64767 - Posted: 3 Nov 2021, 9:23:21 UTC - in response to Message 64764. Then if it's at all possible make the checkpoints closer together. In the computing preferences menu item in "Options" there is a box :-checkpoint at most every.... seconds". This option can only increase the time between checkpoints, not decrease it. The checkpoint interval is coded into the application, Boinc client can't force checkpoints to happen. ID: 64767 ·

Aurum Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318	Message 64768 - Posted: 3 Nov 2021, 11:57:45 UTC - in response to Message 64764. Then if it's at all possible make the checkpoints closer together. In the computing preferences menu item in "Options" there is a box :-checkpoint at most every.... seconds". That does nothing. Mine is set to 10 minutes. ID: 64768 ·

Aurum Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318	Message 64769 - Posted: 3 Nov 2021, 11:59:21 UTC Not only does this project have the worst performing work server in all BOINCdom it's so rude. I just turned in 7 N144 completed tasks and they were recorded as Abandoned. ID: 64769 ·

Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 1059 Credit: 36,657,707 RAC: 14,406	Message 64770 - Posted: 3 Nov 2021, 12:03:33 UTC - in response to Message 64769. Not only does this project have the worst performing work server in all BOINCdom it's so rude. I just turned in 7 N144 completed tasks and they were recorded as Abandoned. That'll be the fault of the server software supplied by BOINC, rather than anything CPDN has done. ID: 64770 ·

Aurum Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318	Message 64771 - Posted: 3 Nov 2021, 12:05:37 UTC - in response to Message 64763. or I get a year's worth of work in one delivery and must abort almost all of it. I have never received close to even six months of work even when work cache set to maximum. "Preferences for this project" page to allow the BOINCer to specify how many WUs of which project they'd like to download and maintain on their computer. In the past, CPDN used to allow users to specify which types of task they could receive, N216, N144 etc though this was before those particular types of task made it onto the drawing board but you get what I mean. I and at least one or two of the other moderators would like this but we have been told it isn't going to be changed, at least in the short term. I assume, I have never had some of the scheduling problems you have because I only run projects other than CPDN when there is no work available here. Windows tasks all get snapped up within a couple of days of appearing or even less, so on that front the only way more work can be done is for more scientists who want to do the areas of research that is suited to that task type. I've gotten a year's worth of work several times, most recently a couple of days ago. The main point is to specify the number of WUs to send. ID: 64771 ·

Aurum Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318	Message 64772 - Posted: 3 Nov 2021, 12:09:25 UTC - in response to Message 64766. And the researchers are well aware that these models take a long time to run. This "BOINC stuff" is only a small part of the research, more "a special treat", rather than "the main course(s)". And it really shows by how poorly they run a BONIC server. They're so lazy they don't even send out a Server Abort when they abandon a project. Last night I completed 7 N144 WUs and they called them Abandoned. That's shameless. That's about seven CPU months of work I could've done for a project that actually cares. ID: 64772 ·

Aurum Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318	Message 64773 - Posted: 3 Nov 2021, 12:10:54 UTC - in response to Message 64770. Not only does this project have the worst performing work server in all BOINCdom it's so rude. I just turned in 7 N144 completed tasks and they were recorded as Abandoned. That'll be the fault of the server software supplied by BOINC, rather than anything CPDN has done. Are you saying it's BOINC's fault that Oxford did not send out a Server Abort signal when they abandoned the N144 project??? ID: 64773 ·

Aurum Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318	Message 64774 - Posted: 3 Nov 2021, 12:12:19 UTC Last modified: 3 Nov 2021, 12:12:39 UTC So I do I know that any of my work will actually be used??? How do I prevent wasting my time and money doing futile work??? ID: 64774 ·

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,087 RAC: 2,202	Message 64775 - Posted: 3 Nov 2021, 12:14:49 UTC - in response to Message 64767. Last modified: 3 Nov 2021, 12:19:08 UTC This option can only increase the time between checkpoints, not decrease it. The checkpoint interval is coded into the application, Boinc client can't force checkpoints to happen. Why would people want checkpoints closer together? If you have 8 Boinc tasks running and you could set the checkpoint interval to 8 minutes, you would be writing a checkpoint every minute on the average. How much load do you want to put on your disk system? I figure out how much I would want to re-run in case of problems. Since N216 tasks take me about a week, I would normally make the interval an hour or so. ID: 64775 ·

Bryn Mawr Send message Joined: 28 Jul 19 Posts: 148 Credit: 12,830,559 RAC: 228	Message 64776 - Posted: 3 Nov 2021, 13:07:42 UTC - in response to Message 64775. This option can only increase the time between checkpoints, not decrease it. The checkpoint interval is coded into the application, Boinc client can't force checkpoints to happen. Why would people want checkpoints closer together? If you have 8 Boinc tasks running and you could set the checkpoint interval to 8 minutes, you would be writing a checkpoint every minute on the average. How much load do you want to put on your disk system? I figure out how much I would want to re-run in case of problems. Since N216 tasks take me about a week, I would normally make the interval an hour or so. In the case of CPDN my systems checkpoint every 2 hours or so. If you don’t leave your system crunching 24/7 then you might well wish that to be a shorter period so that you loose less work each time you shut down. ID: 64776 ·

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,087 RAC: 2,202	Message 64777 - Posted: 3 Nov 2021, 17:49:39 UTC - in response to Message 64776. In the case of CPDN my systems checkpoint every 2 hours or so. If you don’t leave your system crunching 24/7 then you might well wish that to be a shorter period so that you loose less work each time you shut down. I did not think of people shutting their machines down often. Since I leave my machine up 24/7 except updates requiring reboots that I do every week or two. ID: 64777 ·

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 64780 - Posted: 3 Nov 2021, 20:40:46 UTC From an old memory, I think that the climate models checkpoint at the end of each model year. ID: 64780 ·