climateprediction.net home page
New work Discussion

New work Discussion

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 62 · 63 · 64 · 65 · 66 · 67 · 68 . . . 91 · Next

AuthorMessage
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4532
Credit: 18,835,737
RAC: 21,348
Message 64041 - Posted: 8 Jun 2021, 19:46:54 UTC - in response to Message 64038.  

I found my laptops had switched on their turbo-props. The Taskmanager going haywire. The temperatures in cloud nine. Checked the Task Manager, I had fifteen WU's running on a twelve thread machine. Switched of Virtual Box, still twelve tasks? Opened up Boinc Manager and I found twenty-three Windows WU"s. Managed to mark project "No further tasks" just in time. Well, I suspended the Windows Tasks because the task in my VB is at 92% and has already errored out several times.
Stuck. My VB's have tasks. I would like to know what others are going to do. I cannot be the only one.

Almost certainly not the only one. Lots of windows machines have been waiting for work for a long time. This afternoon, 6,360 tasks were released. (They have now all gone.) The large number of tasks you downloaded is due to your BOINC settings. It is a problem mostly due to the nature of this project currently having only sporadic work for Windows.
ID: 64041 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,087
RAC: 2,202
Message 64123 - Posted: 3 Jul 2021, 19:14:18 UTC

I know there no OpenIFS work units for us to process right now, but once they become available, will I automatically get some, or will I need to find out they are available and ask for them? I think I have enough RAM and processing power to run at least one of them at a time. I have both the standard 64-bit libraries and enough of the 32-bit comparability libraries to run my other ClimatePrediction work.
CPU type 	GenuineIntel
Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 	16

Operating System 	Linux Red Hat Enterprise Linux
Red Hat Enterprise Linux 8.4 (Ootpa) [4.18.0-305.7.1.el8_4.x86_64|libc 2.28 (GNU libc)]
BOINC version 	7.16.11
Memory 	  62.4 GB
Cache 	16896 KB
Swap space 	15.62 GB
Total disk space 	117.21 GB
Free Disk Space 	93.45 GB
Measured floating point speed 	6.58 billion ops/sec
Measured integer speed 	31.66 billion ops/sec

ID: 64123 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4532
Credit: 18,835,737
RAC: 21,348
Message 64125 - Posted: 3 Jul 2021, 19:51:10 UTC - in response to Message 64123.  

I know there no OpenIFS work units for us to process right now, but once they become available, will I automatically get some, or will I need to find out they are available and ask for them? I think I have enough RAM and processing power to run at least one of them at a time. I have both the standard 64-bit libraries and enough of the 32-bit comparability libraries to run my other ClimatePrediction work.
CPU type 	GenuineIntel
Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 	16

Operating System 	Linux Red Hat Enterprise Linux
Red Hat Enterprise Linux 8.4 (Ootpa) [4.18.0-305.7.1.el8_4.x86_64|libc 2.28 (GNU libc)]
BOINC version 	7.16.11
Memory 	  62.4 GB
Cache 	16896 KB
Swap space 	15.62 GB
Total disk space 	117.21 GB
Free Disk Space 	93.45 GB
Measured floating point speed 	6.58 billion ops/sec
Measured integer speed 	31.66 billion ops/sec


With 64GB of RAM before some is nicked for video etc, you would have no problems running a few of them at once. I think from memory the testing ones wouldn't get sent out to a machine with less than 5 or 6GB or RAM.
ID: 64125 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,087
RAC: 2,202
Message 64126 - Posted: 3 Jul 2021, 20:59:45 UTC - in response to Message 64125.  

Right now, I do not seem to be using any RAM to speak of. I would run more CPDN once they start downloading again.

But will I need to do anything once those new tasks become available, or will they just start coming?

top - 16:55:34 up 3 days,  8:48,  1 user,  load average: 8.68, 8.52, 8.49
Tasks: 448 total,   9 running, 439 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  0.1 sy, 49.5 ni, 50.1 id,  0.0 wa,  0.1 hi,  0.0 si,  0.0 st
MiB Mem :  63902.3 total,   4001.0 free,   6087.4 used,  53813.9 buff/cache
MiB Swap:  15992.0 total,  15972.5 free,     19.5 used.  57006.1 avail Mem 

    PID    PPID USER      PR  NI S    RES    SHR  %MEM  %CPU  P     TIME+ COMMAND                      
 140656  140645 boinc     39  19 R   1.3g  19764   2.1  99.6  3   1318:44 /var/lib/boinc/projects/cli+ 
 334619  140341 boinc     39  19 R 946728  87800   1.4  99.8  2 184:49.64 ../../projects/boinc.bakerl+ 
 327310  140341 boinc     39  19 R 567364  76884   0.9  94.6  1 269:07.99 ../../projects/boinc.bakerl+ 
 346508  140341 boinc     39  19 R 350140  70540   0.5  99.5  4  38:59.93 ../../projects/boinc.bakerl+ 
 347621  140341 boinc     39  19 R 321620  55712   0.5  99.5  6  23:06.11 ../../projects/www.worldcom+ 
 343562  140341 boinc     39  19 R 153360   2676   0.2  99.7  5  73:17.28 ../../projects/www.worldcom+ 
 348671  140341 boinc     39  19 R 141852  49808   0.2  99.6 15   7:49.47 ../../projects/www.worldcom+ 
 349089  140341 boinc     39  19 R 101544   2668   0.2  99.6  0   6:25.71 ../../projects/www.worldcom+ 
 140341       1 boinc     30  10 S  34452  17716   0.1   0.0 14  21227:08 /usr/bin/boinc               
 140645  140341 boinc     39  19 S  18176  16892   0.0   0.0 13   1:50.86 ../../projects/climatepredi+ 


ID: 64126 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64127 - Posted: 3 Jul 2021, 21:48:27 UTC

We have no information about that.
It depends on how the researchers / project people decide to do things.
ID: 64127 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4532
Credit: 18,835,737
RAC: 21,348
Message 64620 - Posted: 13 Oct 2021, 14:54:53 UTC

Just had two testing tasks for windows which may herald a new batch but don't hold your breath, they were going to take 32 days and they both crashed. I am currently waiting for someone else on testing site to demonstrate either that it is a problem with my BOINC running under WINE in VB or that it is a problem with the tasks.
ID: 64620 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4532
Credit: 18,835,737
RAC: 21,348
Message 64621 - Posted: 14 Oct 2021, 5:09:55 UTC - in response to Message 64620.  

Issue causing first batch to crash in 2 minutes (at point where it switches from global to regional model on first model day) fixed. Another tester is estimating about 20 days.
ID: 64621 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4532
Credit: 18,835,737
RAC: 21,348
Message 64699 - Posted: 26 Oct 2021, 6:40:36 UTC
Last modified: 26 Oct 2021, 8:13:00 UTC

First of my #920s crashed at the end with file size limit exceeded. I am going to fiddle and increase the limit so second will succeed. I have alerted the project.

Edit:
I have edited <max_nbytes>150000000.000000</max_nbytes> to
<max_nbytes>600000000.000000</max_nbytes> for 4.zip on my second task. I have also turned off suspended internet access so when it gets there I can check the file size to see if the first was a one off.
ID: 64699 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1059
Credit: 36,657,707
RAC: 14,406
Message 64700 - Posted: 26 Oct 2021, 8:42:57 UTC - in response to Message 64699.  

I have four 920s 'in flight' - the first couple just passed 70%. I can check the allowances now, and increase any that look low.

Are you sure that it was the _4 zip that went over? Usually, all the zips are about the same size - but only the ones still active when the task finishes trip the size check. Perhaps I can trap the _3 zip at 75% and take a look.
ID: 64700 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4532
Credit: 18,835,737
RAC: 21,348
Message 64701 - Posted: 26 Oct 2021, 8:47:26 UTC - in response to Message 64700.  
Last modified: 26 Oct 2021, 8:49:45 UTC

Are you sure that it was the _4 zip that went over? Usually, all the zips are about the same size - but only the ones still active when the task finishes trip the size check. Perhaps I can trap the _3 zip at 75% and take a look.


Mon 25 Oct 2021 22:30:46 BST | climateprediction.net | Output file hadam4h_h02w_200802_4_920_012115322_0_r75796790_4.zip for task hadam4h_h02w_200802_4_920_012115322_0 exceeds size limit.

Seems indicative.

I see that four successes are now showing for this batch so mine may have been an outlier. I had wondered if it was why no successes were showing up but I guess I was early enough in getting some that no one else was fast enough to finish before mine failed.
ID: 64701 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1059
Credit: 36,657,707
RAC: 14,406
Message 64702 - Posted: 26 Oct 2021, 9:27:31 UTC - in response to Message 64701.  

OK, I see the lie of the land - six output files, four zips, an out, and a restart. All given a limit of 150 MB (decimal), 143 MB (binary). Hopefully this afternoon...
ID: 64702 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4532
Credit: 18,835,737
RAC: 21,348
Message 64703 - Posted: 26 Oct 2021, 9:56:58 UTC - in response to Message 64702.  

OK, I see the lie of the land - six output files, four zips, an out, and a restart. All given a limit of 150 MB (decimal), 143 MB (binary). Hopefully this afternoon...


I think I will do a search and replace on the limits, even though most seem OK.
ID: 64703 · Report as offensive
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2184
Credit: 64,822,615
RAC: 5,275
Message 64704 - Posted: 26 Oct 2021, 13:18:36 UTC - in response to Message 64703.  

The four finished ones were mine. Looking back at the message log, I didn't see any error messages during upload or completion. Maybe I got lucky?
ID: 64704 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1059
Credit: 36,657,707
RAC: 14,406
Message 64705 - Posted: 26 Oct 2021, 16:56:22 UTC

OK, here they come - and we seem to be in that horrible corridor of uncertainty.

hadam4h_h15a_201102_4_920_012116704_0_r395361359_3.zip first:
BOINC Manager (transfers tab) says that it is 147.75 MB, which at first sight would be OK.
Linux says that it's 154.9 MB, and the file size property is said to be 154,924,888 bytes. That's not OK - if BOINC was checking these intermediate file sizes, that would be rejected.

hadam4h_h1i6_201202_4_920_012117168_0_r150029775_3.zip is a little smaller - 147.12 MB (BOINC), 154.3 MB (Linux file manager), 154,271,614 bytes (file size property).

So the project needs to be careful in internal communications: is a megabyte 1,000,000 bytes (as hard disk manufacturers would have you believe), or 1,048,576 bytes (1,024 x 1,024 bytes), as RAM manufacturers would have you believe?

I'm on a fast line in the UK, so it took just over three minutes to upload both these files, and a good few others from other projects - I suspect it would have been even quicker to upload a single file on its own. Last time we had this conversation, didn't we conclude that if you could sneak the last .zip through before the task finished, it would be OK - but on Dave's bored band, or on a congested cable from overseas, the 'end of task' check might cut it off before it had finished?
ID: 64705 · Report as offensive
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2184
Credit: 64,822,615
RAC: 5,275
Message 64706 - Posted: 26 Oct 2021, 17:06:17 UTC - in response to Message 64705.  

I'm on a fast line in the UK, so it took just over three minutes to upload both these files, and a good few others from other projects - I suspect it would have been even quicker to upload a single file on its own. Last time we had this conversation, didn't we conclude that if you could sneak the last .zip through before the task finished, it would be OK - but on Dave's bored band, or on a congested cable from overseas, the 'end of task' check might cut it off before it had finished?

I believe that is correct. We were concerned about slow uploads, or people who suspended boinc comms until the end with several large uploads queued, or problems if an upload server was down for quite awhile.
ID: 64706 · Report as offensive
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 64707 - Posted: 26 Oct 2021, 18:22:01 UTC

I have a few 920s running. Should I abort them and lose a week's work now or let them fail at the end and lose a month's work?
What is this catch and set a new limit you guys are talking about? Is that something we civilians can do?
ID: 64707 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,087
RAC: 2,202
Message 64708 - Posted: 26 Oct 2021, 18:54:59 UTC - in response to Message 64705.  
Last modified: 26 Oct 2021, 19:01:39 UTC

I'm on a fast line in the UK, so it took just over three minutes to upload both these files, and a good few others from other projects - I suspect it would have been even quicker to upload a single file on its own. Last time we had this conversation, didn't we conclude that if you could sneak the last .zip through before the task finished, it would be OK - but on Dave's bored band, or on a congested cable from overseas, the 'end of task' check might cut it off before it had finished?

I am in a fast Internet connection here in USA and my most recent uploads seem to take 20 seconds or a little more.

Mon 25 Oct 2021 02:20:15 PM EDT | climateprediction.net | Started upload of hadam4h_h1bc_200602_4_920_012116922_0_r1467636988_3.zip
Mon 25 Oct 2021 02:20:35 PM EDT | climateprediction.net | Finished upload of hadam4h_h1bc_200602_4_920_012116922_0_r1467636988_3.zip
Mon 25 Oct 2021 04:06:11 PM EDT | climateprediction.net | Started upload of hadam4h_h0h8_201002_4_920_012115838_0_r905931088_3.zip
Mon 25 Oct 2021 04:06:30 PM EDT | climateprediction.net | Finished upload of hadam4h_h0h8_201002_4_920_012115838_0_r905931088_3.zip
Mon 25 Oct 2021 04:54:12 PM EDT | climateprediction.net | Started upload of hadam4h_h14m_201002_4_920_012116680_0_r1942181916_3.zip
Mon 25 Oct 2021 04:54:37 PM EDT | climateprediction.net | Finished upload of hadam4h_h14m_201002_4_920_012116680_0_r1942181916_3.zip
Mon 25 Oct 2021 07:13:33 PM EDT | climateprediction.net | Started upload of hadam4h_h0c7_200602_4_920_012115657_0_r77250837_3.zip
Mon 25 Oct 2021 07:13:53 PM EDT | climateprediction.net | Finished upload of hadam4h_h0c7_200602_4_920_012115657_0_r77250837_3.zip


I do not seem to be using a lot of available RAM.

$ df
Filesystem            1K-blocks      Used  Available   Use% Mounted on

/dev/sdb3             122908728  21114208  95528048    19%  /var/lib/boinc

ID: 64708 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64709 - Posted: 26 Oct 2021, 19:07:36 UTC - in response to Message 64707.  

Arum

The trick with these is to stagger the completion times.
Suspend all but one, give it an hours head start, Resume one and wait another hour, and so on.
That way all of the files won't get bunched up waiting for a turn to upload.

And make sure that nothing else wants to use your net connection at an upload time.
ID: 64709 · Report as offensive
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 64710 - Posted: 26 Oct 2021, 20:04:47 UTC - in response to Message 64709.  
Last modified: 26 Oct 2021, 20:12:01 UTC

The trick with these is to stagger the completion times.
Suspend all but one, give it an hours head start, Resume one and wait another hour, and so on.
That way all of the files won't get bunched up waiting for a turn to upload.
I already decided that I'm only going to run one CP WU per computer. So I've already got that covered.
And make sure that nothing else wants to use your net connection at an upload time.
Now I'm confused. I thought the error under discussion is:
Output file hadam4h_h02w_200802_4_920_012115322_0_r75796790_4.zip for task hadam4h_h02w_200802_4_920_012115322_0 exceeds size limit.
Now instead of exceeding a file size you're talking about how many files are being uploaded at the same time. I'm now running 3,201 WUs of various projects so that will be next to impossible.
One of these commands in ones cc_config file may be useful:
<max_file_xfers>32</max_file_xfers>
<max_file_xfers_per_project>32</max_file_xfers_per_project>
ID: 64710 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64711 - Posted: 26 Oct 2021, 20:26:27 UTC - in response to Message 64710.  

When a task finishes, it produces a large zip file, an "out" file, and a "restart" file. (Which contains the data needed to start the next task in the series, if the researcher is going to continue with that task.)
All of which add up to just enough more data than the plain zips along the way, and this can tip things over the limit.

But these are created at a slight time interval, which should be long enough for the zip, created first, to get out of the way before the others show up.

********************

And one wu per computer can still mean that they all finish at the same time.
ID: 64711 · Report as offensive
Previous · 1 . . . 62 · 63 · 64 · 65 · 66 · 67 · 68 . . . 91 · Next

Message boards : Number crunching : New work Discussion

©2024 cpdn.org