climateprediction.net home page
New work discussion - 2

New work discussion - 2

Message boards : Number crunching : New work discussion - 2
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 30 · 31 · 32 · 33 · 34 · 35 · 36 . . . 42 · Next

AuthorMessage
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4504
Credit: 18,450,004
RAC: 1,042
Message 69244 - Posted: 11 Jul 2023, 18:32:20 UTC - in response to Message 69242.  

p.s. Still running after 4.5 hours, but fingers and toes still crossed for the next 9 days of run time which is more like a couple of weeks clock time.
Looks like mine at any rate are a bit pessimistic on the estimates. Probably from running the EAS tasks that are dealing with more data because of covering a larger and more complex area.
ID: 69244 · Report as offensive
zombie67 [MM]
Avatar

Send message
Joined: 2 Oct 06
Posts: 54
Credit: 27,309,613
RAC: 28,128
Message 69246 - Posted: 11 Jul 2023, 19:04:31 UTC

Deadlines for this new batch are still a year out. Not good.
ID: 69246 · Report as offensive
rob

Send message
Joined: 5 Jun 09
Posts: 96
Credit: 3,614,983
RAC: 2,400
Message 69247 - Posted: 12 Jul 2023, 7:03:31 UTC - in response to Message 69242.  

Well it survived last night's shutdown and this morning's restart. Onwards and upwards - ~6% done in ~8 hours.
ID: 69247 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4504
Credit: 18,450,004
RAC: 1,042
Message 69248 - Posted: 12 Jul 2023, 7:35:28 UTC - in response to Message 69247.  

Well it survived last night's shutdown and this morning's restart. Onwards and upwards - ~6% done in ~8 hours.
Mine have all survived as have my five from the East Asia batch. The latter twice now.
ID: 69248 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4504
Credit: 18,450,004
RAC: 1,042
Message 69250 - Posted: 12 Jul 2023, 8:10:34 UTC - in response to Message 69246.  

Deadlines for this new batch are still a year out. Not good.
This has been raised numerous times with the project. I doubt if moaning about it will make any difference.
ID: 69250 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1030
Credit: 16,107,573
RAC: 15,433
Message 69254 - Posted: 12 Jul 2023, 11:11:55 UTC - in response to Message 69250.  

Deadlines for this new batch are still a year out. Not good.
This has been raised numerous times with the project. I doubt if moaning about it will make any difference.
They have more important things right now. It's also something of a non-issue because once enough results are in CPDN usually close the batch stopping any more resends. But I'll look in the repository and see if I can change it.
---
CPDN Visiting Scientist
ID: 69254 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69257 - Posted: 12 Jul 2023, 13:36:59 UTC - in response to Message 69254.  
Last modified: 12 Jul 2023, 13:37:49 UTC

They have more important things right now. It's also something of a non-issue because once enough results are in CPDN usually close the batch stopping any more resends. But I'll look in the repository and see if I can change it.
Thanks. Pity you can't send out cancels for already running tasks when they have enough data, but that's probably impossible, because of some insane Boinc policy about not upsetting folk who'd somehow prefer to crunch something pointless just because they started it.

Are these new ones supposed to be so small? I'm running them about 5 times faster.
ID: 69257 · Report as offensive
kotenok2000

Send message
Joined: 22 Feb 11
Posts: 32
Credit: 226,546
RAC: 4,080
Message 69258 - Posted: 12 Jul 2023, 14:35:19 UTC

What will ram consumption be for one multicore OpenIFS task compared to multiple workunits of singlecore OpenIFS tasks?
Also how effective is hyperthreading?
ID: 69258 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4504
Credit: 18,450,004
RAC: 1,042
Message 69259 - Posted: 12 Jul 2023, 15:43:43 UTC - in response to Message 69258.  

Also how effective is hyperthreading?

There is a thread somewhere where Glen has answered that one. I will see if I can find it later.
ID: 69259 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1030
Credit: 16,107,573
RAC: 15,433
Message 69260 - Posted: 12 Jul 2023, 15:49:37 UTC - in response to Message 69259.  
Last modified: 12 Jul 2023, 15:58:18 UTC

Also how effective is hyperthreading?
There is a thread somewhere where Glenn has answered that one. I will see if I can find it later.
Hyperthreading does not work for very numerical codes like weather models. You get contention on the chip where threads try to access the floating pt units of which there are only single units available. There's also diminishing returns on access to memory too (large caches like the AMD X3D have little impact).

If you want best throughput (# tasks complete/per day), keep the task count to the same number of cores. If you want the fastest runtime, only run 1 task and keep the machine as quiet as possible.

The post that Dave refers to includes a graph where I tested running OpenIFS on different numbers of cores. You can find the thread here: https://www.cpdn.org/forum_thread.php?id=9184#68081. The results apply to the WAH and other MetOffice models.

It's raw single core speed and memory bandwidth which get the best with CPDN models.

Edit: Re: RAM consumption. Memory only marginally increases with multicore (say ~5-10%) but runtime decreases in line with increasing cores (i.e. half runtime with 2 cores) (and we don't have any multicore models in production yet).
---
CPDN Visiting Scientist
ID: 69260 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4504
Credit: 18,450,004
RAC: 1,042
Message 69261 - Posted: 12 Jul 2023, 16:05:14 UTC

Are these new ones supposed to be so small? I'm running them about 5 times faster.
I am not seeing anything like a five times increase in speed but I remember you saying mine were running faster than yours when they should have been going at a comparable speed. However as they are covering a smaller and less complex area I would expect them to run faster.
ID: 69261 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69263 - Posted: 12 Jul 2023, 16:16:49 UTC - in response to Message 69261.  
Last modified: 12 Jul 2023, 16:17:34 UTC

I am not seeing anything like a five times increase in speed but I remember you saying mine were running faster than yours when they should have been going at a comparable speed. However as they are covering a smaller and less complex area I would expect them to run faster.
I'm seeing 16% done in 18 hours on a Ryzen 9 3900X (my good one with fast dual channel RAM) - predicted one task takes just under 5 days. It's only doing 2 (alongside other projects which fill all 24 threads), and I've told Boinc to allocate 2 threads to each of the CPDN tasks. I guess the faster RAM accounts for x3. And doing 2 instead of 12 will help aswell. Ok, I've no idea how much speed increase is caused by the tasks! How much smaller is the area? And is the resolution the same?
ID: 69263 · Report as offensive
kotenok2000

Send message
Joined: 22 Feb 11
Posts: 32
Credit: 226,546
RAC: 4,080
Message 69264 - Posted: 12 Jul 2023, 16:19:41 UTC

What if you use
<app_config>
<app>
<name>wah2</name>
<max_concurrent>2</max_concurrent>
<fraction_done_exact/>
</app>
</app_config>
ID: 69264 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69265 - Posted: 12 Jul 2023, 16:28:50 UTC - in response to Message 69264.  

What if you use
<app_config>
<app>
<name>wah2</name>
<max_concurrent>2</max_concurrent>
<fraction_done_exact/>
</app>
</app_config>
I prefer to run one per core for the most throughput, not the fastest per task, so I've told it to use two threads per task, thus:

    <app_version>
        <app_name>wah2</app_name>
        <plan_class></plan_class>
        <cmdline></cmdline>
        <avg_ncpus>2.000000</avg_ncpus>
        <ngpus>0.000000</ngpus>
    </app_version>
ID: 69265 · Report as offensive
kotenok2000

Send message
Joined: 22 Feb 11
Posts: 32
Credit: 226,546
RAC: 4,080
Message 69266 - Posted: 12 Jul 2023, 16:30:31 UTC

To be sure that other projects won't take cores for themselves?
ID: 69266 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1051
Credit: 36,341,855
RAC: 2,973
Message 69267 - Posted: 12 Jul 2023, 17:00:52 UTC

Just come out of a 7-hour powercut - sharp cliff-edge drop, no flickering or brownouts.

Both climate models I had running - on different machines - restarted unscathed. One was approaching a trickle+upload: and those went through cleanly as well. Mind you, I had plenty of time to prepare for an orderly restart - turned router and computers off, and started them one at a time in a sensible sequence, so each had access to whatever services they needed, notably DHCP, as they went live.
ID: 69267 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69268 - Posted: 12 Jul 2023, 18:58:03 UTC - in response to Message 69266.  

To be sure that other projects won't take cores for themselves?
Seems to work without doing that. Not sure exactly how Windows allocates things, but the other projects are happy with HT, so I was assuming they took a thread each, and CPDN took a core each. I guess even if Windows isn't too bright (a fair assumption!) with a small number of CPDNs, the chances of them going on the same core are minimal.
ID: 69268 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69269 - Posted: 12 Jul 2023, 18:59:20 UTC - in response to Message 69267.  

Just come out of a 7-hour powercut - sharp cliff-edge drop, no flickering or brownouts.
No UPS? Dear me!

Both climate models I had running - on different machines - restarted unscathed. One was approaching a trickle+upload: and those went through cleanly as well. Mind you, I had plenty of time to prepare for an orderly restart - turned router and computers off, and started them one at a time in a sensible sequence, so each had access to whatever services they needed, notably DHCP, as they went live.
Why would you need to do that? The computers should remember their last IP address, and if they need the internet but it's not ready yet, they'll retry.
ID: 69269 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1118
Credit: 17,163,134
RAC: 2,081
Message 69270 - Posted: 12 Jul 2023, 19:38:06 UTC - in response to Message 69269.  

Just come out of a 7-hour powercut - sharp cliff-edge drop, no flickering or brownouts.

No UPS? Dear me!


I have a UPS that is good for about 13 minutes right now with 12 Boinc tasks running and the monitor on.. But I have a natural gas operated backup generator that comes on within about 10 or 12 seconds and will run as long as the gas company does its duty. The longest power interruption I have experienced here was about 6 1/2 days related to tropical storm Sandy.

Most of my interruptions are much shorter.

Like one second. I did get about a 2 1/2 hour interruption around Christmas that the power company had to fix. But it did not mess up my computer.
ID: 69270 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69275 - Posted: 13 Jul 2023, 2:27:59 UTC - in response to Message 69270.  

I have a UPS that is good for about 13 minutes right now with 12 Boinc tasks running and the monitor on.. But I have a natural gas operated backup generator that comes on within about 10 or 12 seconds and will run as long as the gas company does its duty. The longest power interruption I have experienced here was about 6 1/2 days related to tropical storm Sandy.

Most of my interruptions are much shorter.

Like one second. I did get about a 2 1/2 hour interruption around Christmas that the power company had to fix. But it did not mess up my computer.
I think I've had a single 1 hour cut in 23 years here. But I used to get a lot of 1 second dips which could crash a computer. When I got a corrupted C drive which would no longer boot, I got a UPS. I was then surprised to find it was constantly adjusting the voltage, as it kept getting too high. It was outside the legal specs, and I reported it, but they said if they lowered my voltage (I'm next to the transformer), the far end of the street would be too low. Seems to have stopped now they've replaced the transformer. Not sure if that was because of complaints, new housing being built, it getting old (it didn't look it), it got damaged due to a short, or the solar panels going up everywhere.
ID: 69275 · Report as offensive
Previous · 1 . . . 30 · 31 · 32 · 33 · 34 · 35 · 36 . . . 42 · Next

Message boards : Number crunching : New work discussion - 2

©2024 cpdn.org