climateprediction.net home page
Resend lost tasks?

Resend lost tasks?

Message boards : Number crunching : Resend lost tasks?
Message board moderation

To post messages, you must log in.

AuthorMessage
Tony DeBari

Send message
Joined: 26 Aug 04
Posts: 6
Credit: 1,410,763
RAC: 0
Message 41656 - Posted: 21 Feb 2011, 21:27:48 UTC
Last modified: 21 Feb 2011, 21:33:35 UTC

I have a workunit allocated to one of my hosts (the only active one currently) that is showing as In Progress in the tasks list but never physically made it onto the host.

WU 7195105

Over at Seti@Home we refer to these as "ghost" workunits, and there exists a facility within BOINC to attempt to resend them on the next work fetch. Is this capability enabled here at CPDN? I realize the workunit will eventually expire and be sent to another host to crunch, but that won't happen until a year from now. If the task cannot be resent, I'll suspend work fetch until my cache is empty, then detach/re-attach so that the ghost workunit will be marked as Abandoned and resent immediately.


[Edit] Apologies to all. I just saw a similar thread a little farther down the list. However, the question remains - is there a project-specific reason as to why resending lost tasks is turned off?


-- Tony D.
ID: 41656 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,753,991
RAC: 4,095
Message 41657 - Posted: 22 Feb 2011, 0:33:44 UTC
Last modified: 22 Feb 2011, 0:34:31 UTC

CPDN tends not to use the BOINC facilities for active scheduling - or if they do then the mechanisms are pretty opaque to users (e.g. resubmission of WU batches by project physicists). There's no BOINC validation, for example, presumably due to the complexity and length of the models.

Aborting models early in CPDN can be hazardous because the WU parameters are often wrongly set, so an early abort ends up sterilising the entire WU (spurious "don't need"). That WU does have a max-error of two, so someone's been reading the BOINC manual.

As it happens, I think there is an argument for introducing validation on short models not for scientific reasons or for credit allocation but for job control: the speed of WU coverage could be significantly increased if job issues were stopped after a validated pair. They're rather short-staffed at the moment, so that won't be high on their list ...
ID: 41657 · Report as offensive     Reply Quote
Tony DeBari

Send message
Joined: 26 Aug 04
Posts: 6
Credit: 1,410,763
RAC: 0
Message 41658 - Posted: 22 Feb 2011, 21:54:15 UTC - in response to Message 41657.  

Thank you Iain for the explanation. I personally try never to abort WUs unless they give clear evidence that they have stopped processing, and I have learned from experience with other BOINC projects to err on the side of caution. However, since the WU in question is a phantom and will never be crunched by my host, abandoning it will at least allow it to be reissued to another host in a timely manner, and hopefully get it crunched and returned successfully.


Regards,

-- Tony D.
ID: 41658 · Report as offensive     Reply Quote
old_user541139

Send message
Joined: 10 Oct 08
Posts: 3
Credit: 62,715
RAC: 0
Message 41737 - Posted: 8 Mar 2011, 14:55:56 UTC

ID: 41737 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,753,991
RAC: 4,095
Message 41739 - Posted: 8 Mar 2011, 15:30:36 UTC - in response to Message 41737.  

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=7172497 is not showing on my client machine.

It may be a 'phantom'. If the server is struggling to cope with demand, then jobs are sometimes allocated server-side but never reach the client. These jobs appear on the Web pages but not in the client; there is no way of resending the job, so the entry just has to be ignored.

Do, however, check the 'Show active tasks/Show all tasks' button in the Tasks tab of BOINC Manager. When that button was first introduced, people sometimes thought tasks had gone missing which had simply been temporarily hidden as inactive.
ID: 41739 · Report as offensive     Reply Quote

Message boards : Number crunching : Resend lost tasks?

©2024 cpdn.org