climateprediction.net home page
Bunch of new work units crashing, Windows 10

Bunch of new work units crashing, Windows 10

Message boards : Number crunching : Bunch of new work units crashing, Windows 10
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1066
Credit: 16,546,621
RAC: 2,321
Message 64163 - Posted: 14 Jul 2021, 0:29:14 UTC

I just ot a batch of work units for my Windows 10 machine. All but one crashed. I am not clear if it is my machine or the work units. Here is one of them:
Task 22117442
Name 	wah2_sas50_s2b6_209912_13_917_012110704_0
Workunit 	12110704
Created 	13 Jul 2021, 9:20:23 UTC
Sent 	13 Jul 2021, 18:13:39 UTC
Report deadline 	25 Jun 2022, 23:33:39 UTC
Received 	13 Jul 2021, 23:09:41 UTC
Server state 	Over
Outcome 	Computation error
Client state 	Compute error
Exit status 	-1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION
Computer ID 	1512658
Run time 	4 hours 54 min 23 sec
CPU time 	4 hours 50 min 57 sec
Validate state 	Invalid
Credit 	0.00
Device peak FLOPS 	3.91 GFLOPS
Application version 	Weather At Home 2 (wah2) v8.24
windows_intelx86
Peak working set size 	230.44 MB
Peak swap size 	193.43 MB
Peak disk usage 	0.01 MB
Stderr 	

<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 3221225477 (0xc0000005)</message>
<stderr_txt>


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00463007 read attempt to address 0x013463AC

Engaging BOINC Windows Runtime Debugger...

There is more if you need it.
ID: 64163 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2169
Credit: 64,555,907
RAC: 5,858
Message 64166 - Posted: 14 Jul 2021, 4:28:20 UTC - in response to Message 64163.  

Hard to say. There aren't a lot of initial failures from these batches, but it might not be evident in the stats for another day if there is a problem. You can check the other tasks in the work units your tasks came from and see if they trickle, which would be past where yours crashed.
ID: 64166 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1066
Credit: 16,546,621
RAC: 2,321
Message 64167 - Posted: 14 Jul 2021, 9:18:20 UTC - in response to Message 64166.  

You can check the other tasks in the work units your tasks came from and see if they trickle, which would be past where yours crashed.


Of the batch I got, one is still running and it got two trickles so far.
Of the rest, all of mine failed. Of the failed ones, the work units that the others got (and of my failed ones, one other user got got it also) are all still in progress, but none have trickled yet.
ID: 64167 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1066
Credit: 16,546,621
RAC: 2,321
Message 64168 - Posted: 14 Jul 2021, 10:21:46 UTC - in response to Message 64167.  

P.S.: I got a batch of six similar work units in June and they all completed successfully.
ID: 64168 · Report as offensive     Reply Quote
Albert H.

Send message
Joined: 18 Feb 06
Posts: 72
Credit: 55,087,566
RAC: 29,270
Message 64169 - Posted: 14 Jul 2021, 19:38:13 UTC

All the new workunits I got are running fine :)
ID: 64169 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64170 - Posted: 14 Jul 2021, 21:23:20 UTC - in response to Message 64169.  

That's good to hear Albert.
Thanks.
ID: 64170 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1066
Credit: 16,546,621
RAC: 2,321
Message 64171 - Posted: 14 Jul 2021, 21:36:18 UTC - in response to Message 64169.  

All the new workunits I got are running fine :)


The one I had running has now uploaded 4 trickles, running 1 day 3 hours..
A new one has run 1 hour and 25 minutes, but no trickles yet.
ID: 64171 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64172 - Posted: 15 Jul 2021, 0:35:31 UTC - in response to Message 64171.  

I found this strange "cure" for that error number:

FIX: File System Error (-1073741819) on Windows 10
ID: 64172 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1066
Credit: 16,546,621
RAC: 2,321
Message 64173 - Posted: 15 Jul 2021, 4:10:45 UTC - in response to Message 64172.  

If you recently upgraded from Windows 7 to Windows 10 and cannot install any new programs or run the previous ones as an administrator, you are not alone.


I did not upgrade from Windows 7 to Windows 10. This is a new machine, bought last December. It came with Windows 10 factory installed. It has been running since last December. I have installed BOINC, TaxAct, Firefox, Garmin Express on it since then without any trouble. And the Boinc Client has installed ClimatePrediction, WCG, Universe@Home, and Rosetta@home tasks with no trouble. It has run more than six ClimatePrediction programs without failure.

When I get here:
Inside the Sound Control Panel, go to the Sounds tab.
Click on the drop-down menu under Sound Scheme and choose Windows Default.

Windows Default is already selected.
Then I get here
Select Open and inside the window, drag the slider down to Never notify.

There is no Open in there. There is Open Program, but it does not get me to the slider.

Other things did not work, but I did do this:
4. Run an SFC Scan.

Windows comes with several built-in troubleshooting utilities that can help resolve the issues in no time.

One such handy tool is SFC (System File Checker) that scans through the system’s protected files, identifies the issues, and attempts to resolve them automatically.

Since the error code -1073741819 is a file system error, running a scan via System File Checker can help you resolve it.

Here is how you can run an SFC scan on Windows 10:

Type cmd in the search bar and click on Run as administrator to launch elevated Command Prompt.
Inside the Command Prompt window, type the command mentioned below and hit Enter to execute it.

sfc /scannow

Wait for the scan to complete, and once done, restart your PC.

When I restarted the machine, one of the two ClimatePrediction programs (the long-running one) crashed.

ID: 64173 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64174 - Posted: 15 Jul 2021, 5:30:28 UTC - in response to Message 64173.  

Sorry for asking the obvious, but did you:

Suspend each climate model
Suspend BOINC
Exit from BOINC

And THEN restart the computer?
ID: 64174 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1066
Credit: 16,546,621
RAC: 2,321
Message 64175 - Posted: 15 Jul 2021, 11:48:20 UTC - in response to Message 64174.  

Sorry for asking the obvious, but did you:

Suspend each climate model
Suspend BOINC
Exit from BOINC

And THEN restart the computer?


Sorry for answering: No. I did shut down the boinc manager, but since I do not know how the boinc client comes up, I do not know how to shut it down.

By then, I was getting really frustrated with all the stuff in that page of how to recover. Little agreed with what I saw. And I do not understand Windows very well, so I just run it pretty much as it came out of the box when I bought the machine from Dell. I do let the updates go through.
ID: 64175 · Report as offensive     Reply Quote
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 486
Credit: 29,638,939
RAC: 3,372
Message 64177 - Posted: 15 Jul 2021, 22:15:30 UTC - in response to Message 64175.  
Last modified: 15 Jul 2021, 22:19:38 UTC

"Sorry for answering: No. I did shut down the boinc manager, but since I do not know how the boinc client comes up, I do not know how to shut it down."

The manager controls the client so when you exit the manager the client stops running.

From the BOINC manager help on the FILE menu:-
Exit BOINC: Exit the BOINC manager and all running BOINC applications. No further work will take place until you run the BOINC manager again.
ID: 64177 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2169
Credit: 64,555,907
RAC: 5,858
Message 64178 - Posted: 15 Jul 2021, 22:32:52 UTC - in response to Message 64177.  

"Sorry for answering: No. I did shut down the boinc manager, but since I do not know how the boinc client comes up, I do not know how to shut it down."

The manager controls the client so when you exit the manager the client stops running.

From the BOINC manager help on the FILE menu:-
Exit BOINC: Exit the BOINC manager and all running BOINC applications. No further work will take place until you run the BOINC manager again.


If you run the install of boinc from many Linux package managers, it will install a version of boinc manager that does not shut down the client when you close the manager. For example, on my CentOS 7 (RHEL 7) PC, closing boinc manager does nothing to the client. One must do a

service boinc-client stop

from a terminal window to stop boinc, and you may need to be root, or at least a user in the boinc user group to do so.
ID: 64178 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1066
Credit: 16,546,621
RAC: 2,321
Message 64180 - Posted: 15 Jul 2021, 23:17:16 UTC - in response to Message 64178.  

If you run the install of boinc from many Linux package managers, it will install a version of boinc manager that does not shut down the client when you close the manager. For example, on my CentOS 7 (RHEL 7) PC, closing boinc manager does nothing to the client. One must do a

service boinc-client stop

from a terminal window to stop boinc, and you may need to be root, or at least a user in the boinc user group to do so.


Yes, I know how to do all that in Linux. I am funning Red Hat Enterprise Linux release 8.4 (Ootpa) on my Linux machine. The problem is on my Windows 10 machine. I know little about Windows 10.
ID: 64180 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64181 - Posted: 16 Jul 2021, 0:30:39 UTC - in response to Message 64180.  

The problem is on my Windows 10 machine. I know little about Windows 10


You do all that from the BOINC Manager Menu, as per my previous post:


Suspend each climate model
Suspend BOINC
Exit from BOINC

And THEN restart the computer
ID: 64181 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2169
Credit: 64,555,907
RAC: 5,858
Message 64182 - Posted: 16 Jul 2021, 5:39:53 UTC - in response to Message 64180.  


Yes, I know how to do all that in Linux. I am funning Red Hat Enterprise Linux release 8.4 (Ootpa) on my Linux machine. The problem is on my Windows 10 machine. I know little about Windows 10.

Yep. Sorry, I was multi-tasking and got mixed up.
ID: 64182 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1066
Credit: 16,546,621
RAC: 2,321
Message 64183 - Posted: 16 Jul 2021, 8:12:00 UTC - in response to Message 64182.  

Yep. Sorry, I was multi-tasking and got mixed up.


I make mistakes too. Even when not multi-tasking.
As you can see by my earlier mistake in this very thread.

So I forgive you for being human, if you will forgive me.
Or even if you do not.
ID: 64183 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 64195 - Posted: 19 Jul 2021, 11:17:22 UTC

Do as Les is telling you how to. These WU"s are more sensitive than babies. Does anyone ever wonder as to why they are so sensitive. You could play football with the previous WU's. No, this is not the fault of the WU but it is Boincs fault. What changes have been made with Boinc Client or Boinc Manager? I am somehow managing by running only two at a time. Faster clock speeds, faster turnarounds. Then I do a system reboot and allow two more WU's. The other Gen10 Laptop I have marked "no further tasks" is a chronic culprit, but it is crunching Linux WU's in VB.
ID: 64195 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1066
Credit: 16,546,621
RAC: 2,321
Message 64196 - Posted: 19 Jul 2021, 12:04:48 UTC - in response to Message 64195.  

These WU"s are more sensitive than babies. Does anyone ever wonder as to why they are so sensitive. You could play football with the previous WU's. No, this is not the fault of the WU but it is Boincs fault. What changes have been made with Boinc Client or Boinc Manager?


Yes: I certainly do wonder why these work units are so sensitive.

I do not see why these work units are so sensitive, and I do not necessarily think it is because it is Boinc's fault. If it were, would not work units from other projects suffer the same way? And in all my years of experience, they never do. The main difference I notice is that the Boinc client starts a CPDN process such as
./../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-g...
just like any other Boinc task, but this process does not do much, but one thing it does is start another process such as
/var/lib/boinc/projects/climateprediction.net/hadam4_um_8.09_i68...
and it is just this process that does most (if not all) the work. My guess is that this second, worker, process is not known by the Boinc client, so it does not terminate it in the right way (if at all), although it probably terminates the first process, the one it started, correctly. Then this worker process gets terminated later than the first, as the OS shuts down. Then when restarting, it starts the original first process again, but that first process makes bad assumptions about the old second process (e.g., that it is already running -- but I do not know this -- so things go wrong at this point.
ID: 64196 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1066
Credit: 16,546,621
RAC: 2,321
Message 64197 - Posted: 19 Jul 2021, 12:07:06 UTC - in response to Message 64196.  

P,S.: the two processes in my previous post are not for the same work unit, but you should get the right idea...
ID: 64197 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Bunch of new work units crashing, Windows 10

©2024 climateprediction.net