climateprediction.net home page
The uploads are stuck

The uploads are stuck

Message boards : Number crunching : The uploads are stuck
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 25 · Next

AuthorMessage
wujj123456

Send message
Joined: 14 Sep 08
Posts: 124
Credit: 40,331,747
RAC: 57,250
Message 67851 - Posted: 18 Jan 2023, 16:46:52 UTC - in response to Message 67837.  
Last modified: 18 Jan 2023, 16:52:25 UTC

1) 100GB UI Limit
2) 2x core_count, cant download too many uploads Limit
3) Invalidated tasks by switching out HDD

I also hit the first two but not the last one, though I know the behavior of swapping out disks since I upgrade pretty often and I'm lazy so I usually attempt to boot the same disk first. None of these are obvious for people not familiar with BOINC client, and none of them come with any meaningful diagnostic information. Logs for 1) is even misleading.

I do want to note that one complexity here is that while all these problems are triggered by CPDN upload server being down, none of them are within CPDN team's control or code. They are all BOINC client behavior and it's a bit unfair to expect CPDN team (rather any project team) to educate every subtlety of BOINC client when they themselves may not be the expert. On the other hand, as a volunteer who lost weeks of work trying to contribute, that's totally frustrating.

I feel we should take the opportunity to raise the issues (probably again) in BOINC's forum or github. At least 1) and 2) can use much better logging or notice in UI to inform users. In addition, IMO, 1) is really a bug needs to be fixed. I have no idea how to solve or even communicate 3) though. BOINC client is supposed to require minimal user maintenance, so the current behavior is desirable for most people casually wiping to re-install or upgrading their disk while dropping all WUs on the floor.

Edit: Looks like fix for 1) is merged. Guess we just need to wait for clients to upgrade.
ID: 67851 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1118
Credit: 17,163,134
RAC: 2,081
Message 67852 - Posted: 18 Jan 2023, 17:12:15 UTC - in response to Message 67850.  

This is how we explained the 100GB limit, and its solution on Twitter. Hopefully Jean-David Beyer finds this slightly more informative.


I do not think it explains anything to me.
My setup is
Use no more than 420GB
Leave at least 1GB free
Use no more than 8% of the total.


I see nothing about a 100GB limit everyone seems to be talking about.
ID: 67852 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,298,265
RAC: 14,315
Message 67853 - Posted: 18 Jan 2023, 17:15:54 UTC
Last modified: 18 Jan 2023, 17:20:17 UTC

Try this.

leave one of these options empty:

Use no more than 420GB
Leave at least 1GB free
Use no more than 8% of the total.


Then get BOINC to use 100GB of storage on a single project.

If you do that, then you will then get the 'lack of storage notification error'.

That is the 100GB Limit.
ID: 67853 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 124
Credit: 40,331,747
RAC: 57,250
Message 67854 - Posted: 18 Jan 2023, 17:26:54 UTC - in response to Message 67831.  

Funny...

I have six hosts with CPDN work to upload.
All are using a different time to upload. They are asking the server every 5 minutes for upload.
Only transient http errors.

Going back to the topic. I am seeing changing behaviors too. Before 2023-01-16 11:00 there were minimal upload for me. Then it started ramping up and saturated my upload on 2023-01-17 0:00 until it drops off on 16:00 the same day. I've been getting some uploads for around 1/3 of my link ever since. (Timestamps are accurate to hour in UTC.)

From this comment, it's clear the team is still tweaking the connections to improve uploads while trying not to run out of disk space on upload server. Seems like the current bottleneck is moving data off the upload server. Hopefully it would clear soon to enable more upload connections. I remember from other comments that the usual limit is 300 so we still have a long way to go before full recovery. [/img]
ID: 67854 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,298,265
RAC: 14,315
Message 67855 - Posted: 18 Jan 2023, 17:29:35 UTC
Last modified: 18 Jan 2023, 17:38:03 UTC

Thank you wujj123456 its interesting hearing about which issues you faced.

But you are right, better to get the post back on track for those having upload issues.

Perhaps a new support/info thread could be setup and the last posts by Glen, Richard, Dave, xii5ku, and wujj123456 on this subject could be moved into it?
ID: 67855 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 124
Credit: 40,331,747
RAC: 57,250
Message 67856 - Posted: 18 Jan 2023, 17:50:14 UTC - in response to Message 67855.  
Last modified: 18 Jan 2023, 17:50:47 UTC

If FAQ thread opens up for the specific disk usage issues, we should definitely move there. Otherwise, at least I don't mind continue discussing here since all these problem are triggered by upload being stuck and this will likely be the thread people check. I totally learnt the 100G limit from discussion here.

PS: For context, it is morning here, so I am just catching up and replying to comments I find relevant. Not intentionally breaking your discussions or trying to move that away.
ID: 67856 · Report as offensive     Reply Quote
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,672,453
RAC: 14,037
Message 67863 - Posted: 18 Jan 2023, 20:57:07 UTC - in response to Message 67834.  

The truth is I posted that I had just bought a new HD and was about to swap it out due to so many uploads. Not one mod or team member mentioned the problems that would happen if I did this.


Probably because nobody has tried it before. If I really think about it, I've run into similar problems with 32-bit MacOS VMs with the same general issue (I'd generated VMs with the same default MAC address on different hosts, which didn't cause any problems with the networking but did cause BOINC collisions if they were started too quickly and they'd stomp each other's tasks). But that's down in some pretty weird weeds, and if I'm adding drives, I typically just clone the install to the larger drive. But I'd not seen the thread in time, and I doubt I'd have realized it would have applied to a new OS install anyway. Computers are stupidly complex, and even the people who build them can't reason about all the interactions between the various moving parts.

I'm upload limited anyway if I have all my CPUs crunching (I can easily produce more results on a good sunny week than I can upload), so... I just do less. Yeah, it sucks to drop a bunch of work on the floor. It happens. I've dropped more of the OpenIFS units than I prefer to do while tuning memory limits and such (I have a bunch of 16GB machines which hasn't been a problem until recently), and I certainly gave the OOM killer a workout more than a few times. I'm also running down some disk IO issues, I don't think I have discard enabled on some of my SSDs, and it's becoming an issue. Such is life.

I'll contribute what I can, and not worry about the rest. I heat on compute in my office mostly, but I was heating on WCG and sometimes a resistive heater when I didn't have WUs. Meh.
ID: 67863 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,501,411
RAC: 866
Message 67864 - Posted: 18 Jan 2023, 21:07:00 UTC

The 100 GB limit is -

If you DO NOT check the "Use no more than XXXX GB" box, the default value is 100 GB.

In other words, checking the box with a value of 100 GB is the same as not checking it at all.
ID: 67864 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,298,265
RAC: 14,315
Message 67866 - Posted: 18 Jan 2023, 22:07:58 UTC - in response to Message 67864.  
Last modified: 18 Jan 2023, 22:13:30 UTC

WB8ILI I do understand what you are saying.

The value in EditBox will use a default value of 100GB unless you select the checkbox, even if you have entered a value.

The problem we have is you now are saying something slightly different to advice we had previously been given.

So how are we, or any non-technical person meant to decide who is correct?

Is it you? Or is it the boinc developer who said our tweet was accurate?

Or are both correct and we got the wires mixed up?

All I can say is if there were a sticky post "The 100GB Limit" then that would give a definite answer to this.

You have done some serous crunching for CPDN, so I have tried to write a half decent reply.
ID: 67866 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,298,265
RAC: 14,315
Message 67867 - Posted: 18 Jan 2023, 22:12:19 UTC
Last modified: 18 Jan 2023, 22:24:12 UTC

SolarSyonyk the hard disk issue may not be widely known.

But every competitive cruncher who uses 'bunkering' knows all about using multiple disks.

And indeed someone said this was the security mechanism to stop that.

Yet again who knows which is the correct answer.

If there were a sticky thread called "Instant Invalidated Tasks" then that would give a definite answer.
ID: 67867 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,298,265
RAC: 14,315
Message 67868 - Posted: 18 Jan 2023, 22:16:47 UTC
Last modified: 18 Jan 2023, 22:25:49 UTC

Just to be clear, "the 100GB Limit" or "Instant Invalidated Tasks" are just made up titles. It doesn't matter what title you actually use, just that the issue is solvable or explainable in a sticky post, and that post is very easy to find.

Anyway.. "The uploads are stuck"..
ID: 67868 · Report as offensive     Reply Quote
Yeti

Send message
Joined: 5 Aug 04
Posts: 178
Credit: 17,308,699
RAC: 19,069
Message 67870 - Posted: 18 Jan 2023, 22:28:52 UTC - in response to Message 67867.  

If there were a sticky thread called "Instant Invalidated Tasks" then that would give a definite answer.

Nope, I would never search for such a post, I would search for cancelled Tasks and that is the start of the Dilemma.

I have (re-) started crunching CPDN after having paused for several years round about 10 days ago and was searching for all the important details to know to do it right. I found a FAQ, but this was so basic about BOINC, that didn't help.

I think, the information has to be devided into two or even more sections:

A) generelly running BOINC. That is all knowledge, that is not Project-relevant.
B) Running this special project

Your problem with the disc running full is a general task like A)

How to change a disc is general task like A)

How much memory need OpenISF or special settings like "Keep tasks in memory is needed" is project-specific, so B)

And things, that work on windows do not work on linux, and things that work with Ubuntu 20.x don't work with Ubuntu 22.x, this all makes it more and more complicated.

For example: Your disc-swap under windows would have been very ease: Stop BOINC, copy whole BOINC-DIR to new location, (re-) install BOINC if needed and tell the "new" location being the Data-Section and that's it. You could continue at your last point and nothing will change. Under Linux this won't work
Supporting BOINC, a great concept !
ID: 67870 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,298,265
RAC: 14,315
Message 67871 - Posted: 18 Jan 2023, 22:43:47 UTC
Last modified: 18 Jan 2023, 23:11:44 UTC

I agree and partly why I made the point about titles. It's definitely an important subject especially if you want your content to be found inside and outside of BOINC.

But for now we are just talking about which "issues" should even go into a sticky post, if any.

I agree that in some ways having 2 sections is useful, such as A and B. But in otherways many users cannot tell if something is BOINC or CPDN. Which is perhaps an argument for not splitting them up.
ID: 67871 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 124
Credit: 40,331,747
RAC: 57,250
Message 67873 - Posted: 18 Jan 2023, 22:58:09 UTC - in response to Message 67870.  

For example: Your disc-swap under windows would have been very ease: Stop BOINC, copy whole BOINC-DIR to new location, (re-) install BOINC if needed and tell the "new" location being the Data-Section and that's it. You could continue at your last point and nothing will change. Under Linux this won't work

It's actually even simpler in Linux. Stop the client, mv the whole directory to the new location, create a symlink pointing to the new location with the name of previous directory and then start the client. The client will continue to operate on the old directory name except that's now just a link to the new directory. (Of course you can go the other route of changing boinc client config to use new directory name, similar to the Windows setup you described, but involves config editing. )

The story is still the same though. One has to be aware of how boinc data directory and client identifier assignment works. The data has to be migrated properly before next time boinc instance contacts the server. That's not intuitive, which is what caused the loss of work here.
ID: 67873 · Report as offensive     Reply Quote
Yeti

Send message
Joined: 5 Aug 04
Posts: 178
Credit: 17,308,699
RAC: 19,069
Message 67874 - Posted: 18 Jan 2023, 23:13:18 UTC - in response to Message 67871.  

But in otherways many users cannot tell if something is BOINC or CPDN. Which is perhaps an argument for not splitting them up.
But if you don't split it up, you force the project-Admin of each project to built up this knowledge. I don't think, that this a good way.

Better would be to have a central Generell-BOINC-Side, where you find tutorials and descriptions for common / generell situations.

So, the project-admin could focus on the project relevant infos and I'm shure, they will do with joy / happiness

As has been told somewhere in this discussions, best knowledge about BOINC-generell things have cruncher like me, that take part in races and work with 100ds of instances. We all have had a lot of this problems and have spent much much time to find out, how it works. So, even races have good sides
Supporting BOINC, a great concept !
ID: 67874 · Report as offensive     Reply Quote
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 489
Credit: 30,625,891
RAC: 3,476
Message 67876 - Posted: 18 Jan 2023, 23:25:01 UTC - in response to Message 67850.  

"If you don't install the 32-bit libraries your tasks will eventually keep crashing, and hence your device will get jailed. "

Unfortunately I don't think that is the case - though I stand to be corrected.
ID: 67876 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,298,265
RAC: 14,315
Message 67877 - Posted: 18 Jan 2023, 23:41:54 UTC
Last modified: 19 Jan 2023, 0:25:07 UTC

Yeti, If BOINC can generate the content for itself, and auto-populate it into the forums across projects, that would be great.

That would leave each project to only concentrate on its own stuff.

I also agree that competitive crunchers bring far more to the BOINC table than many people realise. Yes to some it does seem like they are gaming the system. But unless the system is clear about what is and what is not acceptable, I don't think think one can say that.

Most techniques, be it running custom software, running update scripts, running multiple disks or instances, or even recompiling and spoofing, all can have valid reasons for them to be used.

Unfortunately I don't think that is the case - though I stand to be corrected.


Alan K, from what we have seen not every task will crash, but many will. And eventually if you will fail too many and that will put your host in Jail.
ID: 67877 · Report as offensive     Reply Quote
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,672,453
RAC: 14,037
Message 67880 - Posted: 19 Jan 2023, 0:53:25 UTC - in response to Message 67867.  

But every competitive cruncher who uses 'bunkering' knows all about using multiple disks.


I'm sorry, the... what?

I got nothing. I crunch on random hardware I've picked up over the years to heat my office, and to blow off solar surplus during times of plenty (with zero crunching going on during times of photon drought, I have 5kW of panel hung on my office and can't generate 200W some winter days). I'm on a couple crappy internet connections (one terrestrial WISP, one Starlink which doesn't upload worth a damn). I've no idea what "bunkering" BOINC results is, how to do it, or why someone would want to do it.

I've run into "multiple machines are collapsed and screw each other up" when I was doing "Mac" VMs for 32-bit CPDN tasks, and that was it. Sorry, if my disks are full (which can happen here, I don't put big SSDs in my compute rigs), I just stop crunching on the problem project until the problem goes away, and I'm still trying to get out from under the upload server outage - and I can out-compute my upload, even across both links.
ID: 67880 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,298,265
RAC: 14,315
Message 67881 - Posted: 19 Jan 2023, 0:59:56 UTC
Last modified: 19 Jan 2023, 1:05:44 UTC

Bunkering is a technique where you get as many tasks as possible. More than one host is normally allowed.
ID: 67881 · Report as offensive     Reply Quote
Profile Landjunge

Send message
Joined: 17 Aug 07
Posts: 8
Credit: 36,824,190
RAC: 12,336
Message 67882 - Posted: 19 Jan 2023, 5:47:36 UTC

My uploads are building up again =(
ID: 67882 · Report as offensive     Reply Quote
Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 25 · Next

Message boards : Number crunching : The uploads are stuck

©2024 cpdn.org