climateprediction.net (CPDN) home page
Thread 'Questions Before I get started.'

Thread 'Questions Before I get started.'

Questions and Answers : Getting started : Questions Before I get started.
Message board moderation

To post messages, you must log in.

AuthorMessage
lazlo_vii

Send message
Joined: 11 Dec 19
Posts: 108
Credit: 3,012,142
RAC: 0
Message 61659 - Posted: 11 Dec 2019, 12:32:25 UTC
Last modified: 11 Dec 2019, 12:35:04 UTC

First, some background information. I have recently upgraded my home systems form Intel based systems to AMD based systems. Both my desktop PC and home server are now equipped with AMD Ryzen 3700X CPU's and 32GB of DDR4-3200 RAM on Ubuntu 18.04. When I realized just how computing power I had at my finger tips I decided it would a good idea to share it with Science because Science has do so much for me. I am currently running LHC@Home "Atlas Native" work units with 12 threads on the Desktop and 16 threads on the Server 24/7. I also work on Einstein@Home tasks with my GPU's both systems. I have E@H set up in LXD containers with GPU pass through so I don't have to deal with BOINC time-sharing configs. I just start the containers before I go to bed and shut them down when I wake up.

I set a goal to reach 1M points with LHC as quickly as I could and then look for other projects on BOINC to contribute to. After 3 weeks I have over 800K points and I will hit my goal in the next 4 or 5 days. After that I hope to continue running LCH work units with 2-4 threads on each system. At this point I decided that would contribute CPDN and started reading a bit about it. I know that these are long term work units that perform best when undisterbed and that I will need 32bit libraries installed in order to contribute.

Here are my questions:

1.) Are these work units single threaded, multi threaded, or a mixture of the two?
2.) At peak RAM usage how much RAM will each task require?
3.) Can I run them in a unprivileged container or will they need to do things that require escalated permissions?
4.) Are there any known issues with storing and processing CPDN work units on a btrfs filesystem?
5.) Is there anything that I should about know about contributing to CPDN but didn't ask?

Thank you for taking the time read this. I look forward to getting started in the coming days.

EDIT: Woops, I clicked ok too many times and double posted. My appologies.
ID: 61659 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61661 - Posted: 11 Dec 2019, 13:07:59 UTC

1. The apps are single threaded.

2. The amount of ram needed varies widely, depending on the program, and the area of the planet being investigated.
It's been found experimentally that the hadam4h (N216) models also like lots of L3 cache, about 4 Megs per model.

3. No special permissions needed.

4. There are a lot of data files open for each model, and periodically these get saved.
The more models being run simultaneously, the longer this takes.

5. The credits system only runs once per week.
ID: 61661 · Report as offensive     Reply Quote
lazlo_vii

Send message
Joined: 11 Dec 19
Posts: 108
Credit: 3,012,142
RAC: 0
Message 61662 - Posted: 11 Dec 2019, 14:08:54 UTC - in response to Message 61661.  
Last modified: 11 Dec 2019, 14:14:07 UTC

Les,

Thank you for your reply.

1.) Single threaded is good because that will allow me to pin the container to logical CPUs within a specific block of cores.

2.) Not knowing the RAM requirements makes it a little harder to plan which other BOINC projects I will choose to run. In your opinion, given the current models being run, would reserving 2GB of RAM per task be too much, too little, or just right? I want to make sure the container has what it needs without over provisioning it too much. I have 32MB of L3 on each system, so I am not too worried about cache misses there. If I run 4 apps at a time I will only use about half of it.

3.) That is simple. I like simple.

4.) I will keep that in mind when backing up the system.

5.) Points are a fun way to track progress and set goals. Since this is a project about climate change weekly updates are on a relatively short time scale.

6.) I forgot to ask in my original post: Ubuntu 18.04 uses BOINC client version 7.9.3 by default. Will that be OK or will I need a newer version?
ID: 61662 · Report as offensive     Reply Quote
lazlo_vii

Send message
Joined: 11 Dec 19
Posts: 108
Credit: 3,012,142
RAC: 0
Message 61663 - Posted: 11 Dec 2019, 16:19:53 UTC

In order to test this out I set up a container and assigned it 1 CPU and 8GB of RAM. I'll wait to see if it gets work and then analyze it from there.
ID: 61663 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 61665 - Posted: 11 Dec 2019, 16:54:15 UTC

2.) Not knowing the RAM requirements makes it a little harder to plan which other BOINC projects I will choose to run. In your opinion, given the current models being run, would reserving 2GB of RAM per task be too much, too little, or just right?


2GB should be fine as the most hungry of the current batches take about 1.4GB RAM. When OpenIFS models appear again, it may be a different story. In testing some of these took over 5GB of ram which stopped them even downloading to my desktop which only has 4GB. Four would run at once on my 4 core 8GB laptop but they slowed down a lot due to swapping out to disk. However throughput was still greater than running just two or three. Running two seemed to be OK as only rarely did both reach peak RAM usage at once. (Usage varied from well under 1GB up to about 5.3GB if I remember aright.)

6.) I forgot to ask in my original post: Ubuntu 18.04 uses BOINC client version 7.9.3 by default. Will that be OK or will I need a newer version?


7.9.3 will work fine though if you want the latest and greatest,

sudo add-apt-repository ppa:costamagnagianfranco/boinc
sudo apt-get update
sudo apt install boinc


Will get you it.

I rolled my own but unless your knowledge of which libs are in which packages is a lot better than mine it is a steep learning curve and it took me many tries just to get ./configure to complete.
ID: 61665 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61668 - Posted: 11 Dec 2019, 19:11:33 UTC

Daytime again.

Most people running here just get a computer, add it to the project, and run the work.
So if you have a "set up" of some sort, the best way to see if it works, is to try it.

My current models are running about 640 Megs, but some a little while back were using around 3.5 Gigs each.

One more thing:
The 1 year "deadline" is just an artificial limit to keep BOINC from hogging work for the project and not running the much shorter work from other projects.
Here, results are needed ASAP. Which depends on the length of the runs for a given batch of work.
(Which is in their name.)
ID: 61668 · Report as offensive     Reply Quote
lazlo_vii

Send message
Joined: 11 Dec 19
Posts: 108
Credit: 3,012,142
RAC: 0
Message 61670 - Posted: 11 Dec 2019, 20:09:52 UTC

Les, Dave,

Thank you both for your advice. I just want to get work done as efficiently as my PC's will allow. My test container has gotten 12 models (HadAM4 N144) and once I saw the expected deadline of the first model I added a second CPU to the container in order to get through these first 12 as soon as possible. These are running on my Desktop because the Server is under a heavier load with LHC tasks.

After reading some of threads in the Number Crunching forum I have decided that this project really does belong on my Server instead of my Desktop. The server can go for a month or 2 two without a reboot where my desktop gets rebooted about once or twice a week.

Two questions about the projects I am running. Here is the command line output:

boinccmd --get_simple_gui_info
======== Projects ========
1) -----------
   name: climateprediction.net
   master URL: https://climateprediction.net/
   user_name: lazlo_vii
   team_name: 
   resource share: 100.000000
   user_total_credit: 0.000000
   user_expavg_credit: 0.000000
   host_total_credit: 0.000000
   host_expavg_credit: 0.000000
   nrpc_failures: 0
   master_fetch_failures: 0
   master fetch pending: no
   scheduler RPC pending: no
   trickle upload pending: no
   attached via Account Manager: no
   ended: no
   suspended via GUI: no
   don't request more work: no
   disk usage: 0.000000
   last RPC: Wed Dec 11 18:26:51 2019

   project files downloaded: 0.000000
   jobs succeeded: 0
   jobs failed: 0
   elapsed time: 14901.176119
   cross-project ID: 60ba6baf0397ba7eadf412325f852b6d

======== Tasks ========
1) -----------
   name: hadam4_a1zz_209910_6_856_011964306_0
   WU name: hadam4_a1zz_209910_6_856_011964306
   project URL: https://climateprediction.net/
   received: Wed Dec 11 17:16:04 2019
   report deadline: Sun Nov 22 22:36:04 2020
   ready to report: no
   state: downloaded
   scheduler state: scheduled
   active_task_state: EXECUTING
   app version num: 809
   resources: 1 CPU
   CPU time at last checkpoint: 8794.040000
   current CPU time: 9489.430000
   estimated CPU time remaining: 1895339.593568
   fraction done: 0.047268
   swap size: 650 MB
   working set size: 628 MB
   suspended via GUI: no
2) -----------
   name: hadam4_a1zt_209910_6_856_011964300_1
   WU name: hadam4_a1zt_209910_6_856_011964300
   project URL: https://climateprediction.net/
   received: Wed Dec 11 17:16:04 2019
   report deadline: Sun Nov 22 22:36:04 2020
   ready to report: no
   state: downloaded
   scheduler state: scheduled
   active_task_state: EXECUTING
   app version num: 809
   resources: 1 CPU
   CPU time at last checkpoint: 4749.770000
   current CPU time: 5350.870000
   estimated CPU time remaining: 1940143.307859
   fraction done: 0.024747
   swap size: 650 MB
   working set size: 628 MB
   suspended via GUI: no


First, the checkpoints seem to come in about 1000 seconds apart. Is it safe to trust those numbers? Second, for these two models does "fraction done" equal 4% and 2% or .04% and 0.02%?
ID: 61670 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61671 - Posted: 11 Dec 2019, 20:31:25 UTC

I can't answer either of those.
I just let the models run until they finish, then do anything that needs to be done.

If I wanted to know time between checkpoints, I'd get it from "Properties" for a model.

For fraction done, I look at the "Progress" and "Elapsed" columns in the Manager, and do some mental arithmetic.
And then get out a calculator when that doesn't make sense. :)

It's the "Remaining (estimated)" column that can't be trusted.
ID: 61671 · Report as offensive     Reply Quote
lazlo_vii

Send message
Joined: 11 Dec 19
Posts: 108
Credit: 3,012,142
RAC: 0
Message 61686 - Posted: 14 Dec 2019, 11:11:56 UTC
Last modified: 14 Dec 2019, 11:14:58 UTC

My server is no longer running LHC@Home. I am just a few dozen WU from my goal and I will let my desktop handle those.

On my server I have set up two LXD Ubuntu containers and they are happily crunching away at 3 CPDN tasks each with the trickles are being uploaded. The two tasks I had started earlier were in a container on my desktop so as soon as I could transferred it to my server. They have both finished with running times of two and a half days and about 8.5 sec/TS and the new tasks seem to be running at about 8 to 8.5 sec/TS. It looks like I am good to go from here.

Thanks again for your time and keep up the good work!
ID: 61686 · Report as offensive     Reply Quote

Questions and Answers : Getting started : Questions Before I get started.

©2024 cpdn.org