New work discussion

Author	Message
Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4504 Credit: 18,450,004 RAC: 1,042	Message 69244 - Posted: 11 Jul 2023, 18:32:20 UTC - in response to Message 69242. p.s. Still running after 4.5 hours, but fingers and toes still crossed for the next 9 days of run time which is more like a couple of weeks clock time. Looks like mine at any rate are a bit pessimistic on the estimates. Probably from running the EAS tasks that are dealing with more data because of covering a larger and more complex area. ID: 69244 ·

zombie67 [MM] Send message Joined: 2 Oct 06 Posts: 54 Credit: 27,309,613 RAC: 28,128	Message 69246 - Posted: 11 Jul 2023, 19:04:31 UTC Deadlines for this new batch are still a year out. Not good. ID: 69246 ·

rob Send message Joined: 5 Jun 09 Posts: 96 Credit: 3,614,983 RAC: 2,400	Message 69247 - Posted: 12 Jul 2023, 7:03:31 UTC - in response to Message 69242. Well it survived last night's shutdown and this morning's restart. Onwards and upwards - ~6% done in ~8 hours. ID: 69247 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4504 Credit: 18,450,004 RAC: 1,042	Message 69248 - Posted: 12 Jul 2023, 7:35:28 UTC - in response to Message 69247. Well it survived last night's shutdown and this morning's restart. Onwards and upwards - ~6% done in ~8 hours. Mine have all survived as have my five from the East Asia batch. The latter twice now. ID: 69248 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4504 Credit: 18,450,004 RAC: 1,042	Message 69250 - Posted: 12 Jul 2023, 8:10:34 UTC - in response to Message 69246. Deadlines for this new batch are still a year out. Not good. This has been raised numerous times with the project. I doubt if moaning about it will make any difference. ID: 69250 ·

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1030 Credit: 16,107,573 RAC: 15,433	Message 69254 - Posted: 12 Jul 2023, 11:11:55 UTC - in response to Message 69250. Deadlines for this new batch are still a year out. Not good. This has been raised numerous times with the project. I doubt if moaning about it will make any difference. They have more important things right now. It's also something of a non-issue because once enough results are in CPDN usually close the batch stopping any more resends. But I'll look in the repository and see if I can change it. --- CPDN Visiting Scientist ID: 69254 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 69257 - Posted: 12 Jul 2023, 13:36:59 UTC - in response to Message 69254. Last modified: 12 Jul 2023, 13:37:49 UTC They have more important things right now. It's also something of a non-issue because once enough results are in CPDN usually close the batch stopping any more resends. But I'll look in the repository and see if I can change it. Thanks. Pity you can't send out cancels for already running tasks when they have enough data, but that's probably impossible, because of some insane Boinc policy about not upsetting folk who'd somehow prefer to crunch something pointless just because they started it. Are these new ones supposed to be so small? I'm running them about 5 times faster. ID: 69257 ·

kotenok2000 Send message Joined: 22 Feb 11 Posts: 32 Credit: 226,546 RAC: 4,080	Message 69258 - Posted: 12 Jul 2023, 14:35:19 UTC What will ram consumption be for one multicore OpenIFS task compared to multiple workunits of singlecore OpenIFS tasks? Also how effective is hyperthreading? ID: 69258 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4504 Credit: 18,450,004 RAC: 1,042	Message 69259 - Posted: 12 Jul 2023, 15:43:43 UTC - in response to Message 69258. Also how effective is hyperthreading? There is a thread somewhere where Glen has answered that one. I will see if I can find it later. ID: 69259 ·

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1030 Credit: 16,107,573 RAC: 15,433	Message 69260 - Posted: 12 Jul 2023, 15:49:37 UTC - in response to Message 69259. Last modified: 12 Jul 2023, 15:58:18 UTC Also how effective is hyperthreading? There is a thread somewhere where Glenn has answered that one. I will see if I can find it later. Hyperthreading does not work for very numerical codes like weather models. You get contention on the chip where threads try to access the floating pt units of which there are only single units available. There's also diminishing returns on access to memory too (large caches like the AMD X3D have little impact). If you want best throughput (# tasks complete/per day), keep the task count to the same number of cores. If you want the fastest runtime, only run 1 task and keep the machine as quiet as possible. The post that Dave refers to includes a graph where I tested running OpenIFS on different numbers of cores. You can find the thread here: https://www.cpdn.org/forum_thread.php?id=9184#68081. The results apply to the WAH and other MetOffice models. It's raw single core speed and memory bandwidth which get the best with CPDN models. Edit: Re: RAM consumption. Memory only marginally increases with multicore (say ~5-10%) but runtime decreases in line with increasing cores (i.e. half runtime with 2 cores) (and we don't have any multicore models in production yet). --- CPDN Visiting Scientist ID: 69260 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4504 Credit: 18,450,004 RAC: 1,042	Message 69261 - Posted: 12 Jul 2023, 16:05:14 UTC Are these new ones supposed to be so small? I'm running them about 5 times faster. I am not seeing anything like a five times increase in speed but I remember you saying mine were running faster than yours when they should have been going at a comparable speed. However as they are covering a smaller and less complex area I would expect them to run faster. ID: 69261 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 69263 - Posted: 12 Jul 2023, 16:16:49 UTC - in response to Message 69261. Last modified: 12 Jul 2023, 16:17:34 UTC I am not seeing anything like a five times increase in speed but I remember you saying mine were running faster than yours when they should have been going at a comparable speed. However as they are covering a smaller and less complex area I would expect them to run faster. I'm seeing 16% done in 18 hours on a Ryzen 9 3900X (my good one with fast dual channel RAM) - predicted one task takes just under 5 days. It's only doing 2 (alongside other projects which fill all 24 threads), and I've told Boinc to allocate 2 threads to each of the CPDN tasks. I guess the faster RAM accounts for x3. And doing 2 instead of 12 will help aswell. Ok, I've no idea how much speed increase is caused by the tasks! How much smaller is the area? And is the resolution the same? ID: 69263 ·

kotenok2000 Send message Joined: 22 Feb 11 Posts: 32 Credit: 226,546 RAC: 4,080	Message 69264 - Posted: 12 Jul 2023, 16:19:41 UTC What if you use <app_config> <app> <name>wah2</name> <max_concurrent>2</max_concurrent> <fraction_done_exact/> </app> </app_config> ID: 69264 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 69265 - Posted: 12 Jul 2023, 16:28:50 UTC - in response to Message 69264. What if you use <app_config> <app> <name>wah2</name> <max_concurrent>2</max_concurrent> <fraction_done_exact/> </app> </app_config> I prefer to run one per core for the most throughput, not the fastest per task, so I've told it to use two threads per task, thus: <app_version> <app_name>wah2</app_name> <plan_class></plan_class> <cmdline></cmdline> <avg_ncpus>2.000000</avg_ncpus> <ngpus>0.000000</ngpus> </app_version> ID: 69265 ·

kotenok2000 Send message Joined: 22 Feb 11 Posts: 32 Credit: 226,546 RAC: 4,080	Message 69266 - Posted: 12 Jul 2023, 16:30:31 UTC To be sure that other projects won't take cores for themselves? ID: 69266 ·

Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 1051 Credit: 36,341,855 RAC: 2,973	Message 69267 - Posted: 12 Jul 2023, 17:00:52 UTC Just come out of a 7-hour powercut - sharp cliff-edge drop, no flickering or brownouts. Both climate models I had running - on different machines - restarted unscathed. One was approaching a trickle+upload: and those went through cleanly as well. Mind you, I had plenty of time to prepare for an orderly restart - turned router and computers off, and started them one at a time in a sensible sequence, so each had access to whatever services they needed, notably DHCP, as they went live. ID: 69267 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 69268 - Posted: 12 Jul 2023, 18:58:03 UTC - in response to Message 69266. To be sure that other projects won't take cores for themselves? Seems to work without doing that. Not sure exactly how Windows allocates things, but the other projects are happy with HT, so I was assuming they took a thread each, and CPDN took a core each. I guess even if Windows isn't too bright (a fair assumption!) with a small number of CPDNs, the chances of them going on the same core are minimal. ID: 69268 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 69269 - Posted: 12 Jul 2023, 18:59:20 UTC - in response to Message 69267. Just come out of a 7-hour powercut - sharp cliff-edge drop, no flickering or brownouts. No UPS? Dear me! Both climate models I had running - on different machines - restarted unscathed. One was approaching a trickle+upload: and those went through cleanly as well. Mind you, I had plenty of time to prepare for an orderly restart - turned router and computers off, and started them one at a time in a sensible sequence, so each had access to whatever services they needed, notably DHCP, as they went live. Why would you need to do that? The computers should remember their last IP address, and if they need the internet but it's not ready yet, they'll retry. ID: 69269 ·

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1118 Credit: 17,163,134 RAC: 2,081	Message 69270 - Posted: 12 Jul 2023, 19:38:06 UTC - in response to Message 69269. Just come out of a 7-hour powercut - sharp cliff-edge drop, no flickering or brownouts. No UPS? Dear me! I have a UPS that is good for about 13 minutes right now with 12 Boinc tasks running and the monitor on.. But I have a natural gas operated backup generator that comes on within about 10 or 12 seconds and will run as long as the gas company does its duty. The longest power interruption I have experienced here was about 6 1/2 days related to tropical storm Sandy. Most of my interruptions are much shorter. Like one second. I did get about a 2 1/2 hour interruption around Christmas that the power company had to fix. But it did not mess up my computer. ID: 69270 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 69275 - Posted: 13 Jul 2023, 2:27:59 UTC - in response to Message 69270. I have a UPS that is good for about 13 minutes right now with 12 Boinc tasks running and the monitor on.. But I have a natural gas operated backup generator that comes on within about 10 or 12 seconds and will run as long as the gas company does its duty. The longest power interruption I have experienced here was about 6 1/2 days related to tropical storm Sandy. Most of my interruptions are much shorter. Like one second. I did get about a 2 1/2 hour interruption around Christmas that the power company had to fix. But it did not mess up my computer. I think I've had a single 1 hour cut in 23 years here. But I used to get a lot of 1 second dips which could crash a computer. When I got a corrupted C drive which would no longer boot, I got a UPS. I was then surprised to find it was constantly adjusting the voltage, as it kept getting too high. It was outside the legal specs, and I reported it, but they said if they lowered my voltage (I'm next to the transformer), the far end of the street would be too low. Seems to have stopped now they've replaced the transformer. Not sure if that was because of complaints, new housing being built, it getting old (it didn't look it), it got damaged due to a short, or the solar panels going up everywhere. ID: 69275 ·

New work discussion - 2