Message boards :
Number crunching :
If you have used VirtualBox for BOINC and have had issues, please can you share these?
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 14 Sep 08 Posts: 124 Credit: 40,331,747 RAC: 57,250 |
– Disk footprint: The VM images take a decent amount of space. AFAIK, the image download happens only once and there is only one copy of the image. The base VM image is not copied to each slot from my experience with LHC. The disk requirement is still larger due to the snapshot image the task generates while running. I believe the snapshot is incremental difference from the base image and seems to be triggered by checkpoints, but I haven't looked in more detail. This one-time image download is likely negligible given OpenIFS' upload requirement. – Network transfers control taken away from the boinc client: All vboxwrapper based applications which I have encountered so far perform network transfers from within the VM, completely outside of the control of the boinc client. Very good point. It ignores the proxy configuration on my boinc client which I use to limit all boinc traffic. This could be pretty problematic for OpenIFS if it uploads from within VM and upload server changes frequently. Given VM can share directory with boinc though, I feel this can be done properly by having boinc client to do the upload. Perhaps it's just LHC that depends on a distributed filesystem inside the VM needs network from within the VM. Thankfully it doesn't do any upload but others might have hard requirements for a proxy that won't be happy with VM ignoring it. |
Send message Joined: 29 Oct 17 Posts: 1030 Credit: 16,107,573 RAC: 15,433 |
– Network transfers control taken away from the boinc client: All vboxwrapper based applications which I have encountered so far perform network transfers from within the VM, completely outside of the control of the boinc client. That's not the way I would expect to develop it. I would aim to have the vbox app treated as much like a non-vbox app as possible. So input AND output files would go in/out via the shared folder. As long as that is set up correctly, I don't see why the client shouldn't handle network in the normal way. Actually, I'm surprised that other vbox apps are allowed to even access the network. OpenIFS and its wrapper code only talk directly to the client and hand off uploads to it. Maybe there's some history there as people worked through the best way to create vbox apps. There are advantages coming late to the party sometimes. |
Send message Joined: 14 Sep 08 Posts: 124 Credit: 40,331,747 RAC: 57,250 |
That's not the way I would expect to develop it. I would aim to have the vbox app treated as much like a non-vbox app as possible. So input AND output files would go in/out via the shared folder. As long as that is set up correctly, I don't see why the client shouldn't handle network in the normal way. Actually, I'm surprised that other vbox apps are allowed to even access the network. OpenIFS and its wrapper code only talk directly to the client and hand off uploads to it. Maybe there's some history there as people worked through the best way to create vbox apps. There are advantages coming late to the party sometimes. Perfect. Vbox or native, LHC apps require the host to be always online due to the use of distributed cvmfs. It's certainly different from what most BOINC projects do from my experirience. |
Send message Joined: 27 Mar 21 Posts: 79 Credit: 78,302,757 RAC: 1,077 |
There are two types of network access by existing vboxwrapper based applications: – LHC@home's applications perform massive I/O through their cluster filesystem, cvmfs. That's common between their virtualized and native application. It would require a drastic change of the client-server architecture of LHC@home to move this network I/O into BOINC, hence it will obviously never happen. – Cosmology@home's and (I think) Rosetta@home's virtualized applications only use network access in order to look up (and if applicable, side-load) some sort of updates. This I/O is very lightweight in comparison to LHC@home's. But: 1.) Same as LHC@home's, it circumvents BOINC's mechanisms and policies. 2.) It causes a period during startup of the application during which the host CPUs are idling. 3.) It's an IME fragile process which occasionally causes these applications to get stuck in this stage, resulting in never-ending tasks without CPU usage. |
Send message Joined: 1 Jan 07 Posts: 1051 Credit: 36,341,855 RAC: 2,973 |
LHC were very proud of what they'd managed to achieve in integrating BOINC and CERN's disparate requirements through VMs. I remember listening to Ben Segal's account of what was then a work-in-progress at the 2010 BOINC workshop in London. Ben's presentation slides are still available online, and give a flavour of the constraints they were working under. Perhaps the key slide says: Summary of the basic approachAccess the slides via https://boinc.berkeley.edu/trac/wiki/WorkShop10#Schedule But even as he was talking, it was clear that certain problems hadn't been overcome. In particular, that "host <-> guest-VM communication/control layer" couldn't signal back to BOINC that the VM was idle and its compute resources could be released for another project to use. I think the fault there possibly lay in BOINC: it didn't then, and probably still can't now, dynamically adjust for hosts with variable resource availability. |
Send message Joined: 5 Aug 04 Posts: 178 Credit: 17,308,699 RAC: 19,069 |
Okay, let me tell you my experience with vBox here. Before, please keep in mind, I'm a totally Windows-Guy, I never had something to do with Unix / Linux . When LHC@Home started first with vBox and Theory I started using vBox and lets say it worked, no big problems. When Atlas started running vBox I have immediatly started to run und support it. The mess was, problems raised more and more, so I wrote a checklist for the user how to setup a working Atlas@Home system. Here you can take a look at Version 3 (!) of this checklist: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161&postid=29359#29359 In this phase, the most common problems have been missing settings in BIOS (VT-X and other), not enough RAM in the Box and the computer getting sluggish if you run too much vBox-Tasks, still not using all cores. For many many years I habe run this setting in Windows 10 and it was okay.It worked in an acceptable manner. But with upcoming newer releases of vBox the whole system really got unmanageable. I got more and more tasks with the postponed Status. We never could figure out what was the real reason for this, but the postponed tasks are dead, wasted crunching time. So, I, the only Windows-Guy, never having done something with linux, has setup one VM (still with vBox) with Ubuntu 20.04 (with a lot of help from colleagues) that runs Atlas-Native. This worked like a charme and meanwhile I have one VMWare-VM (Ubuntu 22.04) on every WIndows-PC. This VM uses as many Cores and BOINC-Projekts as i want and all works fine, The HOSTs are not sluggish and I have no problems. Running hundreds of Atlas-Native without any problem. The big problem with the way, LCH@Home has realizied the vBox-Struktur is, that they run more than one VM at the same time. This costs a lot of Memory and CPU-Cykles and when the box has enough CPU-Stress, there are happening some small timeouts that make the VM unmanageable => postponed . I have run Rosetta and more Projects that use vBox-VMs, but none of them was really flawless. I have lost 1/3 of the VMs to postponed. So, I won't run any project that forces me to run several VMs of vBox. Perhaps is it possible to build a setup for WIndows, that you need only one VM (like I do now) and inside you run several tasks as if it is a real linux-System. I run 3x 4-CoreTasks Atlas-Native in most of my VMs Supporting BOINC, a great concept ! |
Send message Joined: 12 Apr 21 Posts: 307 Credit: 14,300,326 RAC: 4,834 |
Yeti, just like the avatar, is a legend, when it comes to VBox and BOINC, just check out those troubleshooting guides! :-) Yeti, I know you have extensive experience with VBox, but have you ever thought about checking out WSL2? It's part of Windows and is a very lightweight way to virtualize Linux on Windows. I use it extensively for BOINC projects that work better on Linux or require Linux. It recently became Generally Available on Windows (10 & 11) and Microsoft streamlined the installation process. Even systemd is available on it now, which I'm just starting to explore. Graphics interface is also available but I've never tried it as I control both Windows and Linux BOINC clients via the Windows BOINC manager. I use it for LHC too to run native ATLAS & Theory, with Squid proxy. It has its quirks on LHC. To run Theory native you need to enable vsyscall emulation in the wslconfig file. It's easy to do and having that enabled doesn't break anything else that's running concurrently. ATLAS can only run single threaded on WSL2, still not sure why, so you may not like that. Hoping that with systemd ATLAS will run multithreaded and vsyscall emulation won't be needed but still haven't tested it yet. |
Send message Joined: 5 Aug 04 Posts: 178 Credit: 17,308,699 RAC: 19,069 |
Andrey, for my servers I switched from an early Hyper-V to VMWare and until now, my servers all are running as Guests under VMWare. So, I'm used and experienced with VMWare and switched from vBox to VMWare-Workstation, having the possibility to move a VM from a client to Server and backwards. So, I never tried anything with WSP(1/2) and if I must tell the truth, I don't like to learn this. This would cost me a lot of time again. Supporting BOINC, a great concept ! |
Send message Joined: 29 Oct 17 Posts: 1030 Credit: 16,107,573 RAC: 15,433 |
Yeti, just like the avatar, is a legend, when it comes to VBox and BOINC, just check out those troubleshooting guides! :-)What troubleshooting guides? Can someone direct me? Might be useful for development purposes. |
Send message Joined: 4 Dec 15 Posts: 52 Credit: 2,432,228 RAC: 2,118 |
Yeti, just like the avatar, is a legend, when it comes to VBox and BOINC, just check out those troubleshooting guides! :-) What troubleshooting guides? Can someone direct me? Might be useful for development purposes. The checklist mentioned in this posting. - - - - - - - - - - Greetings, Jens |
Send message Joined: 4 Oct 15 Posts: 34 Credit: 9,075,151 RAC: 374 |
For me, VBox works on every project. But there is one problem, which is not boinc specific, but VBox related. If I want to do VBox Boinc WUs, i have to start the Client with Admin rights. Also, if I want to start a regular Virtual Machine, I have to do it with Admin rights. If i don't, i get the following VBox error: Critical error COM-Object for VirtualBox couldn't be created Errorcode: E_ACCESSDENIED (0x80070005) Komponente: VirtualBoxClientWrap But with Admin rights, everythink with VBox works. At Cosmology, they use an old version of the vbox wrapper, but with manually changing it to a newer one, i can get a Error Rate of <1%, instead of if I recall correctly about 60-70%. So if the project Admins keep it up to date, I do not have problems, at least where boinc is the reason for. Greets Felix |
Send message Joined: 16 Aug 16 Posts: 73 Credit: 53,298,265 RAC: 14,315 |
Thank you [SG]Felix. I think the more people that come forward with issues, and/or possible solutions or observations, in the end all of this information will be useful. And hopefully you will also get a solution, which can be documented, to help others at a later date. |
Send message Joined: 27 Mar 21 Posts: 79 Credit: 78,302,757 RAC: 1,077 |
SG Felix wrote: If I want to do VBox Boinc WUs, i have to start the Client with Admin rights. Also, if I want to start a regular Virtual Machine, I have to do it with Admin rights. If i don't, i get the following VBox error:Check with "id" for your own user ID and with "id boinc" for the boinc user ID whether or not they are member of the vboxusers group. If they are not, add them to the group: sudo usermod -a -G vboxusers $USER sudo usermod -a -G vboxusers boincTo test if this solved your problem for your own user, 1. either simply open a new terminal with a login shell, or log out and back in entirely, 2. then try starting a VM without elevated privileges from the new login. To test if this solved your problem for boinc, 1. shutdown and restart the client, 2. try starting a vbox based task with the client running normally with "boinc" user ID. |
Send message Joined: 4 Oct 15 Posts: 34 Credit: 9,075,151 RAC: 374 |
Thanks xii5ku, I should have mentioned, windows 10 is my main System, on which VBox runs :) So no sudo usermod :) Greets Felix |
Send message Joined: 27 Mar 21 Posts: 79 Credit: 78,302,757 RAC: 1,077 |
SG Felix wrote: windows 10 is my main System, on which VBox runs :)Hm, not sure then. (Last time I used VBox on Windows myself was a while ago on Win 7 Pro.) According to a superficial web search, uninstalling + reinstalling VBox and running the installer as admin while doing so might help. Or overwriting the contents of C:\Users\%USERNAME%\.VirtualBox\VirtualBox.xml by that of VirtualBox.xml-prev perhaps. Or a reset of the access permissions of the .VirtualBox folder and everything in it. |
Send message Joined: 22 Aug 05 Posts: 2 Credit: 1,791,764 RAC: 1,303 |
Issues with LHC tasks for very many days including a previous installation, but now resolved. Don't know if helpful but, since you ask the question - 2023-03-24: new motherboard (and hence new UEFI BIOS) LHC tasks e.g. ATLAS & Theory Simulation 300.07 persistently terminated after 17 to 20 seconds. Yeti's checklist was followed: BIOS amended to permit hardware virtualisation; Leomoon CPU-V confirmed hardware virtualisation was supported and enabled. LHC tasks persistently terminated as before... Hyper-V not enabled, Docker not installed Ryzen 7 3700X; 32GB RAM; 70GB free disc space + 43GB reported "available to BOINC" Windows 10 Pro v 22H2 BOINC Manager v. 7.16.11, wxWidgets version 3.0.1 VirtualBox 7.0.6 VirtualBox Extension Pack 7.0.6 installed 2023-04-13 (Note BOINC program and data are running on different drives.) No apparent anti-virus conflicts advised... but LHC tasks persistently terminated. Was unsure how to set the ports options advised in the checklist so took the 'nuclear' option of simple uninstall/reinstall of VB and BOINC; LHC Theory Simulation 300.07 tasks, confirmed to be using VB, are STILL RUNNING now after many minutes in (currently on VirtualBox v 6.1.12, no Extension Pack downloaded), though one task stated 'Ready to report' after just 15 mins, while others continued to run to various times including 1hr 29 mins. One now has an estimated remaining time of 9 days, but the basic termination fault seems now to have been remedied by the reinstall. |
Send message Joined: 22 Aug 05 Posts: 2 Credit: 1,791,764 RAC: 1,303 |
Issues with LHC tasks for very many days including a previous installation, but now resolved. Don't know if helpful but, since you ask the question - Update: After the PC being turned off for 2 weeks and following a Windows update, the same fault recurred - that is cessation of LHC computing after a few seconds. Surprisingly, Leomoon CPU-V reported that hardware virtualisation was now NEITHER supported NOR enabled. BIOS settings showed that AMD-v was, in fact, still enabled. For reasons I cannot remember I changed Windows Security's Device Security's Core Isolation from Memory Integrity ON to OFF, which required a restart. Leomoon CPU-V confirmed hardware virtualisation was supported and enabled. LHC now runs. Experimentally, turned Core Isolation from Memory Integrity OFF to ON. LHC continues to run, so far for about 30 mins. |
Send message Joined: 15 May 09 Posts: 4504 Credit: 18,450,832 RAC: 1,108 |
Stupidly, I didn't make a note of it. Recently installed tiny10 (a minimal windows 10 version) under VB. I already had Ubuntu running as guest under ubuntu host. Install kept failing till I found the right thing to change in bios (On my motherboard it had a different name from that suggested when I looked up the issue on a websearch.) Anyway tiny10 can run on as little as 2GB RAM. I don't starve it as I want to run tasks in the windows version of BOINC! What I do find is that compared to running tasks from the same batch using WINE there is something like a 20% performance hit. My next job will be to get a new task on it, then copy the relevant folder from one installation to the other and check whether or not there is any difference in the results of the task under WINE and that in the VB. I do know that tasks that crash under a native Windows installation with the sig11 fault more often than not succeed when WINE is used. |
©2024 cpdn.org