Thread 'Questions Before I get started.'

Author	Message
lazlo_vii Send message Joined: 11 Dec 19 Posts: 108 Credit: 3,012,142 RAC: 0	Message 61659 - Posted: 11 Dec 2019, 12:32:25 UTC Last modified: 11 Dec 2019, 12:35:04 UTC First, some background information. I have recently upgraded my home systems form Intel based systems to AMD based systems. Both my desktop PC and home server are now equipped with AMD Ryzen 3700X CPU's and 32GB of DDR4-3200 RAM on Ubuntu 18.04. When I realized just how computing power I had at my finger tips I decided it would a good idea to share it with Science because Science has do so much for me. I am currently running LHC@Home "Atlas Native" work units with 12 threads on the Desktop and 16 threads on the Server 24/7. I also work on Einstein@Home tasks with my GPU's both systems. I have E@H set up in LXD containers with GPU pass through so I don't have to deal with BOINC time-sharing configs. I just start the containers before I go to bed and shut them down when I wake up. I set a goal to reach 1M points with LHC as quickly as I could and then look for other projects on BOINC to contribute to. After 3 weeks I have over 800K points and I will hit my goal in the next 4 or 5 days. After that I hope to continue running LCH work units with 2-4 threads on each system. At this point I decided that would contribute CPDN and started reading a bit about it. I know that these are long term work units that perform best when undisterbed and that I will need 32bit libraries installed in order to contribute. Here are my questions: 1.) Are these work units single threaded, multi threaded, or a mixture of the two? 2.) At peak RAM usage how much RAM will each task require? 3.) Can I run them in a unprivileged container or will they need to do things that require escalated permissions? 4.) Are there any known issues with storing and processing CPDN work units on a btrfs filesystem? 5.) Is there anything that I should about know about contributing to CPDN but didn't ask? Thank you for taking the time read this. I look forward to getting started in the coming days. EDIT: Woops, I clicked ok too many times and double posted. My appologies. ID: 61659 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 61661 - Posted: 11 Dec 2019, 13:07:59 UTC 1. The apps are single threaded. 2. The amount of ram needed varies widely, depending on the program, and the area of the planet being investigated. It's been found experimentally that the hadam4h (N216) models also like lots of L3 cache, about 4 Megs per model. 3. No special permissions needed. 4. There are a lot of data files open for each model, and periodically these get saved. The more models being run simultaneously, the longer this takes. 5. The credits system only runs once per week. ID: 61661 · Reply Quote

lazlo_vii Send message Joined: 11 Dec 19 Posts: 108 Credit: 3,012,142 RAC: 0	Message 61662 - Posted: 11 Dec 2019, 14:08:54 UTC - in response to Message 61661. Last modified: 11 Dec 2019, 14:14:07 UTC Les, Thank you for your reply. 1.) Single threaded is good because that will allow me to pin the container to logical CPUs within a specific block of cores. 2.) Not knowing the RAM requirements makes it a little harder to plan which other BOINC projects I will choose to run. In your opinion, given the current models being run, would reserving 2GB of RAM per task be too much, too little, or just right? I want to make sure the container has what it needs without over provisioning it too much. I have 32MB of L3 on each system, so I am not too worried about cache misses there. If I run 4 apps at a time I will only use about half of it. 3.) That is simple. I like simple. 4.) I will keep that in mind when backing up the system. 5.) Points are a fun way to track progress and set goals. Since this is a project about climate change weekly updates are on a relatively short time scale. 6.) I forgot to ask in my original post: Ubuntu 18.04 uses BOINC client version 7.9.3 by default. Will that be OK or will I need a newer version? ID: 61662 · Reply Quote

lazlo_vii Send message Joined: 11 Dec 19 Posts: 108 Credit: 3,012,142 RAC: 0	Message 61663 - Posted: 11 Dec 2019, 16:19:53 UTC In order to test this out I set up a container and assigned it 1 CPU and 8GB of RAM. I'll wait to see if it gets work and then analyze it from there. ID: 61663 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944	Message 61665 - Posted: 11 Dec 2019, 16:54:15 UTC 2.) Not knowing the RAM requirements makes it a little harder to plan which other BOINC projects I will choose to run. In your opinion, given the current models being run, would reserving 2GB of RAM per task be too much, too little, or just right? 2GB should be fine as the most hungry of the current batches take about 1.4GB RAM. When OpenIFS models appear again, it may be a different story. In testing some of these took over 5GB of ram which stopped them even downloading to my desktop which only has 4GB. Four would run at once on my 4 core 8GB laptop but they slowed down a lot due to swapping out to disk. However throughput was still greater than running just two or three. Running two seemed to be OK as only rarely did both reach peak RAM usage at once. (Usage varied from well under 1GB up to about 5.3GB if I remember aright.) 6.) I forgot to ask in my original post: Ubuntu 18.04 uses BOINC client version 7.9.3 by default. Will that be OK or will I need a newer version? 7.9.3 will work fine though if you want the latest and greatest, sudo add-apt-repository ppa:costamagnagianfranco/boinc sudo apt-get update sudo apt install boinc Will get you it. I rolled my own but unless your knowledge of which libs are in which packages is a lot better than mine it is a steep learning curve and it took me many tries just to get ./configure to complete. ID: 61665 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 61668 - Posted: 11 Dec 2019, 19:11:33 UTC Daytime again. Most people running here just get a computer, add it to the project, and run the work. So if you have a "set up" of some sort, the best way to see if it works, is to try it. My current models are running about 640 Megs, but some a little while back were using around 3.5 Gigs each. One more thing: The 1 year "deadline" is just an artificial limit to keep BOINC from hogging work for the project and not running the much shorter work from other projects. Here, results are needed ASAP. Which depends on the length of the runs for a given batch of work. (Which is in their name.) ID: 61668 · Reply Quote

lazlo_vii Send message Joined: 11 Dec 19 Posts: 108 Credit: 3,012,142 RAC: 0	Message 61670 - Posted: 11 Dec 2019, 20:09:52 UTC Les, Dave, Thank you both for your advice. I just want to get work done as efficiently as my PC's will allow. My test container has gotten 12 models (HadAM4 N144) and once I saw the expected deadline of the first model I added a second CPU to the container in order to get through these first 12 as soon as possible. These are running on my Desktop because the Server is under a heavier load with LHC tasks. After reading some of threads in the Number Crunching forum I have decided that this project really does belong on my Server instead of my Desktop. The server can go for a month or 2 two without a reboot where my desktop gets rebooted about once or twice a week. Two questions about the projects I am running. Here is the command line output: boinccmd --get_simple_gui_info ======== Projects ======== 1) ----------- name: climateprediction.net master URL: https://climateprediction.net/ user_name: lazlo_vii team_name: resource share: 100.000000 user_total_credit: 0.000000 user_expavg_credit: 0.000000 host_total_credit: 0.000000 host_expavg_credit: 0.000000 nrpc_failures: 0 master_fetch_failures: 0 master fetch pending: no scheduler RPC pending: no trickle upload pending: no attached via Account Manager: no ended: no suspended via GUI: no don't request more work: no disk usage: 0.000000 last RPC: Wed Dec 11 18:26:51 2019 project files downloaded: 0.000000 jobs succeeded: 0 jobs failed: 0 elapsed time: 14901.176119 cross-project ID: 60ba6baf0397ba7eadf412325f852b6d ======== Tasks ======== 1) ----------- name: hadam4_a1zz_209910_6_856_011964306_0 WU name: hadam4_a1zz_209910_6_856_011964306 project URL: https://climateprediction.net/ received: Wed Dec 11 17:16:04 2019 report deadline: Sun Nov 22 22:36:04 2020 ready to report: no state: downloaded scheduler state: scheduled active_task_state: EXECUTING app version num: 809 resources: 1 CPU CPU time at last checkpoint: 8794.040000 current CPU time: 9489.430000 estimated CPU time remaining: 1895339.593568 fraction done: 0.047268 swap size: 650 MB working set size: 628 MB suspended via GUI: no 2) ----------- name: hadam4_a1zt_209910_6_856_011964300_1 WU name: hadam4_a1zt_209910_6_856_011964300 project URL: https://climateprediction.net/ received: Wed Dec 11 17:16:04 2019 report deadline: Sun Nov 22 22:36:04 2020 ready to report: no state: downloaded scheduler state: scheduled active_task_state: EXECUTING app version num: 809 resources: 1 CPU CPU time at last checkpoint: 4749.770000 current CPU time: 5350.870000 estimated CPU time remaining: 1940143.307859 fraction done: 0.024747 swap size: 650 MB working set size: 628 MB suspended via GUI: no First, the checkpoints seem to come in about 1000 seconds apart. Is it safe to trust those numbers? Second, for these two models does "fraction done" equal 4% and 2% or .04% and 0.02%? ID: 61670 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 61671 - Posted: 11 Dec 2019, 20:31:25 UTC I can't answer either of those. I just let the models run until they finish, then do anything that needs to be done. If I wanted to know time between checkpoints, I'd get it from "Properties" for a model. For fraction done, I look at the "Progress" and "Elapsed" columns in the Manager, and do some mental arithmetic. And then get out a calculator when that doesn't make sense. :) It's the "Remaining (estimated)" column that can't be trusted. ID: 61671 · Reply Quote

lazlo_vii Send message Joined: 11 Dec 19 Posts: 108 Credit: 3,012,142 RAC: 0	Message 61686 - Posted: 14 Dec 2019, 11:11:56 UTC Last modified: 14 Dec 2019, 11:14:58 UTC My server is no longer running LHC@Home. I am just a few dozen WU from my goal and I will let my desktop handle those. On my server I have set up two LXD Ubuntu containers and they are happily crunching away at 3 CPDN tasks each with the trickles are being uploaded. The two tasks I had started earlier were in a container on my desktop so as soon as I could transferred it to my server. They have both finished with running times of two and a half days and about 8.5 sec/TS and the new tasks seem to be running at about 8 to 8.5 sec/TS. It looks like I am good to go from here. Thanks again for your time and keep up the good work! ID: 61686 · Reply Quote