climateprediction.net (CPDN) home page
Thread 'Computing errors'

Thread 'Computing errors'

Questions and Answers : Getting started : Computing errors
Message board moderation

To post messages, you must log in.

AuthorMessage
Davide Cioni

Send message
Joined: 17 Jul 17
Posts: 2
Credit: 8,598
RAC: 0
Message 59794 - Posted: 12 Mar 2019, 14:21:52 UTC

Hi, some computing errors are occurring in some of my tasks and I can't figure out why.

These are the 4 tasks that errored out after just 5 minutes of computation:
https://www.cpdn.org/cpdnboinc/result.php?resultid=21540120
https://www.cpdn.org/cpdnboinc/result.php?resultid=21540394
https://www.cpdn.org/cpdnboinc/result.php?resultid=21538760
https://www.cpdn.org/cpdnboinc/result.php?resultid=21553817
(this last one has for some reason not been returned to the servers yet, as of the time I'm writing this)

Only thing they have in common is the region (South America). How can I prevent these errors?
ID: 59794 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 59795 - Posted: 12 Mar 2019, 14:49:09 UTC - in response to Message 59794.  

Only thing they have in common is the region (South America). How can I prevent these errors?


Hi Davide,
18% of this batch have failed with this error so far. None have been running long enough to complete. One of the other moderators has raised this with the project so it is being looked into but looking at your computer spec, there is no obvious reason why that should be contributing to the fails. I think it is just bad luck that you have started at a time when there have been a few batches prone to this error



A signal 11 error, commonly know as a segmentation fault, means that the program accessed a memory location that was not assigned to it. A signal 11 error may be due to a bug in one of the software programs that is installed, or faulty hardware.
Taken from the new work thread in number crunching section. All I can suggest is hang in there as some of this batch along with the other problem ones have got past the stage at which a lot are crashing.
ID: 59795 · Report as offensive     Reply Quote
Davide Cioni

Send message
Joined: 17 Jul 17
Posts: 2
Credit: 8,598
RAC: 0
Message 59798 - Posted: 12 Mar 2019, 16:07:11 UTC - in response to Message 59795.  

Ok, thank you for the answer!

Anyways, I thought that in a WU name the batch number was the third number from the end.

wah2_sam25_m013_199512_60_795_011770745_0 <- for example.

But since these numbers are different in the names of the WUs I linked before, I'm probably wrong.

So, how do I tell what batch a WU belongs to, given the name?
ID: 59798 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59800 - Posted: 12 Mar 2019, 18:30:47 UTC

You're correct - that is the batch number.

There are 3 batches for South America at present: 797, 798, and 799.

797 seems to be the main problem, at present.

Being discussed here.
ID: 59800 · Report as offensive     Reply Quote

Questions and Answers : Getting started : Computing errors

©2024 cpdn.org