climateprediction.net home page
Posts by Eirik Redd

Posts by Eirik Redd

1) Message boards : Cafe CPDN : Off topic posts. (Message 69154)
Posted 7 Jul 2023 by Eirik Redd
Post:
And so it goes. [lit ref non -]
2) Message boards : Number crunching : Big credit jump! (Message 68778)
Posted 20 May 2023 by Eirik Redd
Post:
I saw the totally insane wrong RAC a week or so ago when the hardware move and software update happened. I deleted the tail of local statistics file , but want to allow new tasks soon. It's a minor annoyance.
Allowing new work from CPDN is my major thing this week.
3) Message boards : Number crunching : The uploads are stuck (Message 68102)
Posted 29 Jan 2023 by Eirik Redd
Post:
This reminds me of some other project's trick. They grant additional credits if tasks are returned within X days to disincentive excessive hoarding.


Back when my machine could always get work from CPDN and WCG (and Seti@home), I had at least 0.35 days of work and about 0.65 days additional work in my preferences. Now that Seti@home is gone, WCG is sort-of back up, and CPDN is erratic in work availability, my machine is set to at least 0.50 days of work and 1.5 days of additional work, and that 1.5 days is not really enough. When the upload server went down, I mostly let my machine keep crunching and I got around 20 completed tasks before they started uploading again. I do not think of that as hoarding. IIRC, some of the more recent "classical" CPDN work took around 8 days to complete a task, and in the distant past (and on slower machines) tasks could take several months.

But I do not think, with the new Oifs tasks, there is much point grabbing a month's supply because they would time-out before I could get around to them. I usually leave my machine up 24/7.


Yup, I'm an old-timer here, times change, the new models have mucho memory needs, but we don't have to do interim backups any more (yet) because to decades ago models took months to complete and we didn't want to waste a quarter of a workunit after a couple weeks.
Naah, we'll adapt to the new.
And I've noticed, and bought on the price decline, some ECC UDIMMS for my current hosts, to good effect.
4) Message boards : Number crunching : Hardware for new models. (Message 67923)
Posted 20 Jan 2023 by Eirik Redd
Post:
Thanks for the quick summary.
1. Where I live, less than 3km from the two Uni campuses (that had multi-terabit bandwidth 12 years ago when I quit there) in this metro area, Gbit Fiber is "not available" to my home.
I might be able to double my upload speed on ADSL at about the same price I'm paying now for 568 kbytes/sec upload. Paying also for coax cable would bring my total upload bandwidth to 2Mbyte or 20mbit for an extra 80 USD/month.
2. Disk space is to totally cheap here now. When the upload fail on CPDN happened all I needed to do was to re-assign some some spare less-than-
terabyte free space. No problem, no cost
3. RAM capacity. I use AMD Ryzen 7 and 9.. They use DDR4 ECC UDIMMS @ 3200. The price of those has dropped fast as DDR5 is newest thing. 64GB 3200 UDIMMS (the least possible RAM for the current oifs models and CPUs with at least 8 cores using only 5 out of 8.) now way less than 300 USD at NEMIX.com. I'll buy another 64G soon, before they run out of stock.Memory speed has little effect on oifs jobs so far in my experience. Memory space can, and has, wasted some workunits for me. Like when I allowed 5 oifs to run on a AMD 5900X with only 32G ECC UDIMMS. And no swap space. The dreaded OOM-killer. Not good.

So, for me here, Disk and RAM getting cheaper, internet bandwidth upload unavailable, CPUs relatively cheap. I'm gonna ask my kid who lives on the edge of a barrio, who I think said that they have gigabit fiber up and down.
Only in the USA :(

xii5ku wrote:
Hardware requirements for current "OpenIFS 43r3 Perturbed Surface" work:

The following items need to be taken into account, in descending order of concern:
1. Upload bandwidth of your internet link.
2. Disk space.
3. RAM capacity.
99. CPU. This one doesn't really matter, except of course that CPU core count has got an influence on how many tasks you may want to run in parallel, and that core count × core speed influences how many tasks you can complete per day at most. One or both of these factors (concurrent tasks, average rate of task completions) influence the sizing of items 1…3 in this list.
Note, I bolded one word in the first sentence after the fact.

This priority order has been criticised here. I admit that my perspective is somewhat biased, as I am owning several computers with relatively high core count and high computational throughput and am used to be able to fully utilize them. (Although that's not always trivial to accomplish, because many BOINC projects are focused on low core count/ low throughput hosts.)

However, given how the current "OpenIFS 43r3 Perturbed Surface" campaign is going so far, my priority list is – empirically; refer to thread 9167, thread 9178 – indeed quite generally applicable.

current
5) Message boards : Number crunching : OpenIFS Discussion (Message 67098)
Posted 28 Dec 2022 by Eirik Redd
Post:
... I was sooo happy the CPDN had an abundance of jobs and joined the party - only to then find out I can't get rid of my results.
Two machines crunching, two harddrives slowly filling up.

Thanks, that's funny. :-) Initially it's "Where's the work?!", now it's "How do I get rid of the results?!"

Me 2. Locally here, even funnier, because of local weather severe coldsnap just when my fastest hottest CPU's ran out of work. Had to burn a lot of methane to keep my house warm. Murphy's law.
And at the winter holidays, when tech support for low-budget volunteer projects infrastructure is minimal or so overpriced.
It's funny, ironic, anti-serendipitous, and another example of the famous Murphy Law.
keep on crunching, people. Patience pays. Thanks to all.
E
6) Message boards : Number crunching : The uploads are stuck (Message 67096)
Posted 28 Dec 2022 by Eirik Redd
Post:
I know and I feel you. The non-math projects have been dwindling over the years. WCG used to cover a whole lot more but these days are just two medical projects with ARP occasionally trickling in. The migration off IBM certainly didn't go well. The projects I added in recent years (asteroid, universe, LHC) are all because at some point, all projects I contributed to run out of work. Among the long list of math projects, I have yet found anything I can remotely relate to. In addition, for winter, I'd rather run my computers than turning on the heater.

Still though, BOINC or any projects are generally not run as a high availability service. That requires a level of funding and expertise that are generally not available to researchers and that's also a very different focus compared to science research. Sure we contribute compute power at our own cost, but I personally don't consider that enough to justify expecting people to troubleshoot during holidays.


Totally agree.
Especially about the heating value of desktop and workstation computers, with the recent cold snap here in North America (47 N latitude here).
The less gas I need to burn to keep my home above 18C, the better. My local electric supply is mostly old-time safe fission nuke and reasonably cheap.

keep on crunching.

E
7) Message boards : Number crunching : OpenIFS Discussion (Message 66639)
Posted 29 Nov 2022 by Eirik Redd
Post:
Sadness. My ADSL that's been adequate isn't adequate anymore for the many uploads per model - that's about a GiB and a half per work-unit. Throttling downloads until my very Asymmetric ISP upload bottleneck gets replaced with Gbit (likely soon).
Models run in about 11 hours on my slowest and fastest multicore machines, but as was disclosed way in advance, they need at least 5GB per running model, they get less, they slow waaay down.
I've ordered an AMD 5800X3D to see if the bigger L3 cache helps with this kind of work.
Thanks to all for supporting this work with your time and compute capacity.
8) Message boards : Number crunching : OpenIFS Discussion (Message 66636)
Posted 29 Nov 2022 by Eirik Redd
Post:
I've seen at least one of those 'double free or corruption' but only on an old i7-7700 with non-ecc memory.
9) Message boards : Number crunching : New work Discussion (Message 64992)
Posted 23 Jan 2022 by Eirik Redd
Post:
The one computer that I tried to run these on failed all 8 tasks with segmentation violations. Looking at those work units, a couple of those work units had all 3 tasks fail with segmentation violation errors, a few had all fail for various errors, and a couple work units had one of the tasks progress and produce trickles, including producing trickles on another Linux PC. Quite odd.

Odd indeed. I'm clueless SIGSEGV ?? and some work a bit at least on on Macs?
I have no further useful input.
Hope someone somewhere can figure this problem out.
10) Message boards : Number crunching : New work Discussion (Message 64975)
Posted 17 Jan 2022 by Eirik Redd
Post:
I just now manually terminated all the "shorts" batch 926 waiting to run.
Because they've all been dying sigsegv
If cpdn runs out of work, I can run other projects, and maybe spend a few hours on hardware and software updates.

keep on crunching

e
11) Message boards : Number crunching : Can't attach to project. project is temporarily unavailable But . . (Message 64607)
Posted 10 Oct 2021 by Eirik Redd
Post:
Which machine are you talking about? You have eight Linux computers attached to the project, and one Windows computer.

The Windows computer will be unable to contact the project at this time, because of a security certificate expiry in the ca-bundle.crt file. Wait for a new Windows client to be released - I have no idea of the timescale for that at the moment - or explore the workrounds being discussed on the BOINC message board.


This is and was the computer named Ilex on Debian oldstable which is visible on the CPDN website. I don't worry about the inability to attach to CPDN. There's other projects out there. WCG for ilex.
Maybe sometime next week I'll finish a build of a Ryzen 5800X and will try to attach to CPDN.
If that works -- cool.
If not? No sleep lost. No work on CPDN anyhow. No worries.
Would be interested if anyone has attached to "no work now" project CPDN" in the last 2 weeks. ?
<edit>
12) Message boards : Number crunching : Site problems (Message 64584)
Posted 4 Oct 2021 by Eirik Redd
Post:
Site been Pownd?
Sorry to think so, maybe?
Can't attach to project.
All on this thread speculate ssl problems.
Tried those fixes
No help.
WTF?
Also -- checking local wu records vs website records -- big mismatch for wu's last few months
Or have I just lost my mind?
HELP!


Or maybe it''s just the website blundering changing what I thought work-units stats that I've been recording all year to different stats now?

Yo no se.

Shirimasen
13) Message boards : Number crunching : Site problems (Message 64583)
Posted 4 Oct 2021 by Eirik Redd
Post:
Site been Pownd?
Sorry to think so, maybe?
Can't attach to project.
All on this thread speculate ssl problems.
Tried those fixes
No help.
WTF?
Also -- checking local wu records vs website records -- big mismatch for wu's last few months
Or have I just lost my mind?
HELP!
14) Message boards : Number crunching : Can't attach to project. project is temporarily unavailable But . . (Message 64571)
Posted 3 Oct 2021 by Eirik Redd
Post:
But server status page shows none down. This page seems to working fine. I've looked at a few of my computers and see no trickle-up errors.
WCG wu's run, download, upload all OK. Attached to Rosetta OK, downloading a wu from there.
Puzzlement. Zero dark 30 here.
Will investigate further after sleep
Any suggestions welcome.

E
15) Message boards : Number crunching : WU Marked As Abandoned, Still Running On Computer (Message 64415)
Posted 28 Aug 2021 by Eirik Redd
Post:
WU hadam4_a16s_201310_6_914_012099105 is showing as abandoned on my task list, but it is still running on the VM (Ubuntu, computer ID is 1493840). Should I really abort it on that VM, or can the status be changed to "In Progress"?

Steve


No way, no easy way, no way that will not violate the auditability of the work.
If /when/ there's a foxtrot software fail, no researcher will waste time trying to resurrect the lost wu. No way.
Try fix software edge case, maybe. Try re-submit particular work-unit maybe. Report as unspecified failure in multilevel software stack, sure.

Kill it if you can. Otherwise wait until the wu times out next year.
Keep on crunching.

e
16) Message boards : Number crunching : Project Outage (Message 64365)
Posted 16 Aug 2021 by Eirik Redd
Post:
I find that my trickles from Sunday 15August and now (here at UTC-5) Monday are all still on my boxen, but with the suffix '.sent'
Sent, but not acknowledged. And somehow hidden, had to go root to see em.
In other words, trickles are being bounced, and possibly diverted. No worries from here.
Hope this helps.
17) Message boards : Number crunching : Dr Lisa Su says up to 192MB L3 on newer Ryzen -- hope it's true (Message 64021)
Posted 2 Jun 2021 by Eirik Redd
Post:
There's been some discussion here about the high demand for L3 cache with many recent climate models.
If this leaked unverified "news" turns out true --
No links here, totally unconfirmed.
But good news, if/when it happens.

e
18) Questions and Answers : Unix/Linux : Run Linux work units with Windows 10 WSL (Message 63959)
Posted 8 May 2021 by Eirik Redd
Post:
WSL seems to " get better all the time" like the way this going
19) Questions and Answers : Unix/Linux : Shutting down for re-boot. (Message 63951)
Posted 7 May 2021 by Eirik Redd
Post:
Thanks for adding that. I have in fact had "Leave non-GPU tasks in memory while suspended" enabled on my boxes for many years. The measures I have outlined are in addition to that.

Not sure whether something has changed in the tasks or in more recent incarnations of BOINC but of late even when I have had restarts due to power failure, (electrician turning mains off without warning) I haven't lost tasks to it. Something has improved but I don't know what.(Over I would guess last 9 months to a year is an approximate time for the change.)


My experience similar. I do check "leave non-GPU tasks in memory" and I always suspend tasks when a reboot is optional (like for a kernel upgrade or such) before reboot.
Yup, Something has improved but I don't know what.
e
20) Message boards : Number crunching : Little work, yet the most "important" thing in the world? (Message 63950)
Posted 7 May 2021 by Eirik Redd
Post:
The 32bit libraries is not a problem unique to CPDN. It is an issue for a number of projects under Linux. With CPDN the issue is with something like a million lines of Fortran which has been compiled sorting out a 64bit version of programs from the met office without the source code or a license that allows mucking about with it even if we had the source is a far from trivial task even if the project had programmers with the necessary Fortran knowledge.

Sorting it out for other projects some of which may have written their own code would I assume be easier, even if they haven't seen the need to do it yet.

At some point there will probably be OpenIFS tasks for Linux and Mac which are 64bit and should, "just work." The ones which crashed during testing were problems with ancillary files as I understand it. That and file size limits being set too low in some cases. Then of course there is just the sheer complexity of climate modelling that means mistakes are more likely. Most get caught by the testing branch but not all.


I've learned a quick method, which not everyone will like, that has helped with me 2-3 recent linux installs, that gets all the 32-bit libs needed in a few seconds or minutes or even hours, depending on your internet download speed

No finding the executables, no ldd-ing to find the libraries, no looking up past threads here.
It's now part of my install procedure for linux users who want to do CPDN. But requires a bit of trust in winehq.org developers I don't know well.

Just go to winehq.org and follow the instructions for downloading to your flavor of linux. There's an "dpkg --add-architecture i386" (for debian and ubuntu) a "download key && add key " and the the apt-install " .

What reminded me that I've used this successfully for a few installs recently was the recent post here about wsl on Windows. Winehq is kind of an inverse to that (not precisely) .

Anyhow, works for me, uses some disk, but I (almost never use wine anymore) have lostsa old slow disks

HTH -- hope this workaround helps.

er


Next 20

©2024 climateprediction.net