climateprediction.net home page
Premature finish of hadam3p tasks

Premature finish of hadam3p tasks

Message boards : Number crunching : Premature finish of hadam3p tasks
Message board moderation

To post messages, you must log in.

AuthorMessage
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 43866 - Posted: 24 Feb 2012, 14:33:16 UTC

Hi
Computer Id :- 1142892

My last 7 hadam3p tasks have all 'error whilst computing' virtually immediately after starting.
Is there a general problem or should I simply try starting new tasks?
(Or is the problem at my end?)

Regards
David
ID: 43866 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4532
Credit: 18,763,629
RAC: 18,764
Message 43867 - Posted: 24 Feb 2012, 15:40:44 UTC - in response to Message 43866.  

I see that on your other computer which is also a Mac there is not problem and that up till 27th January you were completing these models ok. This seems to rule out most of the ideas I might have had from perusing these fora. Has anything changed on the machine in question? If it is a general problem I guess you will see it when the other machine next finishes a task. All I can give you is that if it is a general problem it probably only affects Macs as my linux machine doesn't have any problem and has just started running two new hadam3p tasks.Perhaps anyone running windows could confirm that those machines are not affected. My guess is that they won't be or with the number of windows machines out there someone would have posted by now. If it is a general Mac problem, I guess we will know within 24 hours.

Dave
ID: 43867 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,748,193
RAC: 3,811
Message 43868 - Posted: 24 Feb 2012, 16:18:17 UTC
Last modified: 24 Feb 2012, 18:08:41 UTC

Did you upgrade or reinstall BOINC?

There is a bug in BOINC/CPDN for Mac that means an upgrade changes the file permissions so that every subsequent model for any application type that has aleady been run will crash. The solution is to reset the project, which clears out all the downloaded applications so that they can be re-downloaded with the correct permissions.

A fix has been developed for the CPDN side of things but has not quite been released yet.

[Edit: Doesn't look like it: comparing a success and a failure (eventually) has 6.10.58 for both. Virus checker?]
ID: 43868 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 43869 - Posted: 24 Feb 2012, 21:03:34 UTC

Haven't upgraded BOINC - both my Mac's use 6.10.58.
Also no virus checker on either machine.

David
ID: 43869 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 43870 - Posted: 24 Feb 2012, 21:29:39 UTC - in response to Message 43869.  

There's a sticky post at the top of the Macintosh section of this board about this problem, but the increased security (sandboxing) which caused it, was only supposed to occur with BOINC 6.12.*
It might help to try a Project Reset.
If it doesn't, then you'll have to stop trying to run the Regional models until a new version of them for the Mac is released.


Backups: Here
ID: 43870 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4532
Credit: 18,763,629
RAC: 18,764
Message 43871 - Posted: 24 Feb 2012, 22:28:04 UTC - in response to Message 43870.  

I know I don't use a mac but not sure I understand this one as the machine had been running hadam3p models fine up till 27th January. But then I don't understand a lot of the foibles of my own Linux box either so maybe that doesn't mean anything. (-:
ID: 43871 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 43872 - Posted: 25 Feb 2012, 12:50:48 UTC

I can't think of anything at my end to account for this - no changes of any kind to this Mac. I was running Malaria Control stuff for a few weeks until CPDN got more stuff available.

However, I didn't get an error status in the BOINC manager messages, as seen in the following,
the task just stopped.

climateprediction.net Fri Feb 24 11:09:44 2012 Starting hadam3p_saf_1qr6_1977_1_006991306_
climateprediction.net Fri Feb 24 11:13:05 2012
Starting task hadam3p_saf_1qr6_1977_1_006991306_2 using hadam3p_saf version 609
climateprediction.net Fri Feb 24 11:13:05 201
Computation for task hadam3p_saf_1qr6_1977_1_006991306_2 finished
climateprediction.net Fri Feb 24 11:13:05 2012
Output file hadam3p_saf_1qr6_1977_1_006991306_2_1.zip for task hadam3p_saf_1qr6_1977_1_006991306_2 absent.
.
.
.
climateprediction.net Fri Feb 24 11:13:05 2012
Output file file hadam3p_saf_1qr6_1977_1_006991306_2_12.zip for task hadam3p_saf_1qr6_1977_1_006991306_2 absent
climateprediction.net Fri Feb 24 11:13:05 2012
Output file hadam3p_saf_1qr6_1977_1_006991306_2_13.zip for task hadam3p_saf_1qr6_1977_1_006991306_2 absent

The error message was in the task information in my account data.

I might be wrong but I think that this problem is occurring with other people as well -
see Workunits 7967420/ 7967421/ 7083006 as examples.

Regards
David
ID: 43872 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,748,193
RAC: 3,811
Message 43873 - Posted: 25 Feb 2012, 15:10:42 UTC

A more detailed error message is in the stderr section of the model results page (if you're prepared to wait - there's an absurd delay at the moment):

<stderr_txt>
execl(/Library/Application Support/BOINC Data/projects/climateprediction.net/hadam3p_pnw_um_6.09_i686-apple-darwin, 177615) failed!
execl(/Library/Application Support/BOINC Data/projects/climateprediction.net/hadrm3p_pnw_um_6.09_i686-apple-darwin, 177615) failed!
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=64089, selfPID=64085, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Regional yearly means requires 12 input files got 0
Called boinc_finish

</stderr_txt>

This shows that the two atmosphere and ocean model processes cannot even start properly, which is what happens with the Mac permissions bug described earlier. So Les's advice is good advice: try a project reset - you've got nothing to lose as nothing is running.

As far as the other crashes are concerned, the machines in 7967420 are also Macs - and their error messages are the same as yours. If you look at the machines then you will notice that their 'average credit' is zero - i.e. they've been serially trashing as many models as they can download for some time. This is a situation where (much-maligned) credits ought to be useful - it does surprise me that so many Mac users haven't noticed over a long period of time that their machines are producing absolutely nothing. The bug isn't the fault of the users and volunteer computing should require as little effort as possible, but it's never been like that in my experience - checking on models is needed and an occasional visit to the message boards wouldn't hurt either.

PS I should say that I have a Mac and it runs fine, having done the required reset after each upgrade/re-install ...
ID: 43873 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 43874 - Posted: 25 Feb 2012, 20:19:10 UTC

Thanks guys, I've done as you suggested and reset the project and the latest downloads are running OK.
It just seemed bizarre that with no changes to my system, a bug that surfaced with a new version of BOINC should suddenly appear to be present in an older version!! Especially since my other Mac, which has the same configuration, trundles along quite happily.
As you say Iain, it's really surprising that the other failures haven't been spotted before.
ID: 43874 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4532
Credit: 18,763,629
RAC: 18,764
Message 43875 - Posted: 25 Feb 2012, 20:24:40 UTC - in response to Message 43874.  

Surprising to me too. I had followed some of the posts re this problem but had not heard of it appearing out of the blue on a machine that had had no changes made to it. Some change in the hadam3p tasks? But again, why one machine and not the other? I suspect the answer will end up being, "42."
ID: 43875 · Report as offensive     Reply Quote

Message boards : Number crunching : Premature finish of hadam3p tasks

©2024 cpdn.org