Les Bayliss Forum moderator Send message Joined: Sep 5 04 Posts: 4743 Credit: 7,747,983 RAC: 769
My guess is that you\'re interrupting the 3 phase slab models at the end of a phase and before the next phase has started. They don\'t like this!
There\'s LOTS of post processing at the end of each phase, which involves extracting data, consolidating it, and then zipping them for upload. Interrupt this and the files are history.
If a model has reached the end of a phase, wait until after the first trickle in the next phase before interrupting.
geophi Forum moderator Send message Joined: Aug 7 04 Posts: 1403 Credit: 21,037,529 RAC: 1,394
It looks like you\'ve had 7 errors right at the end of phase 1. As Les said, something appears to be happening to interrupt post-processing at that critical end-of-phase time. It seems unlikely that you would be manually interrupting each model at the time of failure since the failures occurred at 7 different times.
If I recall correctly, some executable other than the hadsm3 um process is called at post processing. Perhaps Vista, or an antivirus, or anti-malware application has locked this file that is only needed at that time? Ian/Thyme might have a better idea.
Sphagc,
Would you post a link to the workunit that you\'re talking about? And also, which computer is this in your list of computers?
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/hosts_user.php?userid=392646
Computer which is showing problem:
996941 [tasks] Cozzie-VistaX64 home 4,198.64 88,812 GenuineIntel
Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz [Intel64 Family 6 Model 15 Stepping 11] Microsoft Windows Vista
Ultimate x64 Edition, Service Pack 2, (06.00.6002.00) 18 Sep 2009 12:41:42 UTC
Machine is left running 24/7 and I only reboot after Microsoft Updates (making sure I close down BOINC before shutdown).
Tasks with Errors.
9938307 6649750 9 Sep 2009 20:00:32 UTC 15 Sep 2009 17:51:07 UTC Over Client error Compute error 364,086.40 2,282.60 2,282.60
9894197 6645339 2 Sep 2009 19:15:06 UTC 7 Sep 2009 19:51:26 UTC Over Client error Compute error 417,807.50 2,282.60 2,282.60
9891457 6645065 11 Sep 2009 15:06:47 UTC 16 Sep 2009 10:13:09 UTC Over Client error Compute error 368,213.10 2,282.60 2,282.60
9826402 6638561 6 Sep 2009 16:39:28 UTC 11 Sep 2009 15:06:47 UTC Over Client error Compute error 400,210.60 2,282.60 2,282.60
9811750 6637096 12 Sep 2009 20:35:18 UTC 17 Sep 2009 19:32:18 UTC Over Client error Compute error 392,664.40 2,282.60 2,282.60
9752529 6631174 7 Sep 2009 19:53:02 UTC 12 Sep 2009 20:35:18 UTC Over Client error Compute error 393,902.40 2,282.60 2,282.60
9618960 6597657 9 Sep 2009 17:06:57 UTC 15 Sep 2009 19:57:11 UTC Over Client error Compute error 378,376.20 2,282.60 2,282.60
NB. Everything else seems to be working fine with shorter HADSM3Ps - I am doing nothing different with them, not had problem with the longer ones before.
geophi Forum moderator Send message Joined: Aug 7 04 Posts: 1403 Credit: 21,037,529 RAC: 1,394
@sphagc
Are there any differences in setup between that PC and your other Windows PCs that are successfully running hadsm3 type models? Different antivirus? Different antimalware program? Different firewalls?
Seems like file permissions problems. Reset security on all files in your BOINC\'s data/projects directory. Could also be Vista security.... The climate applications need to be able to spawn themselves and their post-processing items. Without this execute permission, task will fail. I know there\'s a Windows Defender or Vista Security something-or-other or perhaps virus protection that might be preventing this.
Other than that, afraid I can\'t be much help with Vista....
Check that projects/climateprediction.net in your BOINC data directory contains the file hadsm3_se_6.07_windows_intelx86.zip (1,958,740 bytes) and that it has been unzipped to hadsm3_se_6.07_windows_intelx86.exe (2,212,352 bytes, modification time 12:11:16 on 21 August 2008).
____________ "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
Thyme Lawn Forum moderator Send message Joined: Aug 5 04 Posts: 1195 Credit: 10,381,411 RAC: 266
Could not launch smallexecs process. Last Error=5
A further thought about that message. Error number 5 is \"Access denied\" so the cause could be file permissions or locking.
____________ "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
A further thought about that message. Error number 5 is \"Access denied\" so the cause could be file permissions or locking.
Thanks for all the replies, I have checked and both exe & zip file are present with all permissions set as far as I can see correctly.
The two quad-core systems both running Vista X64 Ultimate with Spyware Doctor for Malware detection, but problem systems has Kapersky Internet Security 2009 running whist, the other has Kapersky Anti-Virus 6 for Workstations. File permissions etc have been set identical, unless the Security Suite has something extra I have missed, although previous HADSM have cuased no problems.
Anyway everyone, thanks for messages I will keep an eye on the systems and report back if I spot any further problems.
Well... Wish I could figure out why, but I\'ve had far too many compute errors running cpdn tasks and far too much frustration like this one: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=6693008 where I\'ve burned hundreds of thousands of compute seconds only to have it punt and get but a fraction of credit. And judging from the above result, I\'m not the only one experiencing these type of failures. Perhaps my computer isn\'t up to the demand, but I don\'t believe that explains it. I\'ve run Aqua Multithread for hundreds of hours without error, I\'ve got Folding runnng on both GPUs daily with nary a problem. All while getting my normal work done. And other BOINC projects crunch along happily side by side with cpdn while it \"face-plants\" yet again. Ah well... I gave it a go. That should count for something I guess...
My Primegrid WU\'s around the time were unaffected which rules out processor problems and the NFS WU in memory survived which rules out a lack of available memory (since NFS is very sensitive to memory issues). None of the other 15 projects showed any issues whatsoever, just the CPDN WU. It had jumped to 100% sometime while I was gone, but was still \'Waiting to Run\'. I caught it before it restarted and changed the \'waiting\' to \'computer error\'. The graphics listed it as being at only 71% (despite the 100% given in the BOINC manager) and the temps had gone blue.
____________ ~It only takes one bottle cap moving at 23,000 mph to ruin your whole day~
Les Bayliss Forum moderator Send message Joined: Sep 5 04 Posts: 4743 Credit: 7,747,983 RAC: 769
If the temperatures were blue, then either the model hadn\'t run long enough to generate the data needed by the graphics package to show the correct colours, (blue is the default colour immediately on starting, and before sufficient data has been crunched), or it had turned into an \'iceworld\'.
Iceworld description here, discussion here, and appeal for data here. The later only applies to people who take regular backups, and are prepared to do some extra work.