Questions and Answers :
Windows :
Automatic Backup any good?
Message board moderation
Author | Message |
---|---|
Send message Joined: 8 Aug 04 Posts: 69 Credit: 1,561,341 RAC: 0 |
After 13 days of crunching, my laptop lost power and did not have the time to save work. :( When BOINC was restarted, my experiment was at once reported as failed, rendering that WU unrecoverable. CPDN Server says result completed, so I do not think it is a good idea to restore any backup from Yesterday and waste another 48 hours before the trikle goes in and gets rejected. So to avoid further trashing of WU\'s I have detatched that computer. Now, with WU-runtimes from 3 weeks to maybe 6 or 8 months, likelihood of crashes are huge. Haven\'t checked statistics, but ratio of successfully completed Runs and crashed ditto must be depressing. Having completed several THC-experiments, I sometimes had to restore a backup after a crash. Here the old client told me about the crash but asked me if the crash should be reported to CPDN. By shutting down and restoring the backup from previous day, I was able to continue crunching. This option is not available in BOINC, trashing even more Runs. How about letting the client do a backup every 24 hours automatically. If the client discovers that the data-files have been corrupted, it should try restoring from the backup, and if this fails, ask the user for permission to report the crash to CPDN. This way the user could restore a further backup if he had made such, and keep the experiment alive. I know that there is one solution available now: Suspend network activity, thus preventing the client from reporting anything until You are sure it is OK, but it is error prone (Error 40, the one behind the keyboard.) So how about implemeting the auto-backup suggested in the next client? ChrisD |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Oxford Uni has nothing to do with developing BOINC. You\'ll have to discuss this on the BOINC forums <a href=\"http://boinc.berkeley.edu/dev/\"> here.</a> |
Send message Joined: 8 Aug 04 Posts: 69 Credit: 1,561,341 RAC: 0 |
Oxford Uni has nothing to do with developing BOINC. Please Pardon me, but I may not have explained my problem correct. As I understand this, BOINC manages the flow of data between my machine and the servers running the various projects. Each project makes an application to do the actual math involved in that project. As I see it, the CPDN application, when restarted after the crash, could not find its Data files and therefore aborted the WU. This has nothing to do with BOINC. When the CPDN application gives up, BOINC faithfully and without delay, sends the required message issued by the CPDN app. reporting the crash and asks for replacement work. What I was asking for is a CPDN client a little more rigid. One that does not give up because the work-area is corrupt, but tries to revert to a known good state. I know it is already there: the files: restart.day restart.month and restart.year holds a restart point, but they are altered repeatedly by the client, and if the crash happens when one of these files are accessed, no salvage is possible unless these files are backed up somewhere safe. When listening to my Computer, data are updated several times/hour. Each time, a disk error will crash the WU. Backing files up once every 24 hours, will reduce the risk by a large factor. The CPDN Server does not state the number of crashed experiments, only the successful compled ones. However looking at the total model Years computed, there are Model Years enough for more than the double amount. Of course there will still be crashes, but looking at the posts, a lot of users mourn the loss of good computing time and the science that goes down the drain with it. If a more fail safe CPDN client were made, more experiments just might make it to the finish line, thus contributing to science. Thank You for Your time. ChrisD |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
One of the problems with this project is that it uses the code developed by The Met for their forcasting. This is 64 bit Fortran, is over 50 megs in size, and contains over a million lines of code. It took 2 or 3 years for the Oxford programmers to get it to run, in the pre-BOINC days. It\'s somewhat amazing that it DOES run on desktops, let alone laptops, in spite of the wide variety of hardward/software combinations on which it is tried. And the pc programs against which it has to compete. Backups are really a computer owner\'s responsibility, and not just for this project\'s programs and data. And suspending and then exiting BOINC is a must BEFORE making a backup, so that the many files involved don\'t get out of sync. Someday automatic backups may get builtin, but it\'s not likly to happen real soon. The next phase, experiment 2, is due to be released in a few weeks, and there are hopes that the final pahse may be in 64bit code. But it isn\'t even funded yet. And remember, this project has a limited lifetime, possibly another 3 years, so waiting for things to get better may not be an option. :) |
Send message Joined: 13 Sep 04 Posts: 228 Credit: 354,979 RAC: 0 |
I had lots of failed WUs in the past, so I finally started making weekly manual backups. An amazing thing happened after this: no more crashes! It appears as if backups prevent WUs from crashing... |
Send message Joined: 23 Oct 05 Posts: 22 Credit: 526,746 RAC: 0 |
I use this script to make automatic backups. It is run on a swedish WinXP, so you\'ll have to modify it for your language (the path to the BOINC folder, the name of the ntbackup window that is \"säkerhetskopiering\" in swedish). Save it as a *.vbs file and just double-click to run. I also schedule it to run once a day. Dont forget to remove old backups, since this script doesnät overwrite them. set WshShell = WScript.CreateObject(\"WScript.Shell\") WshShell.logevent 4, \"Starting backup of BOINC folder\" REM Exit BOINC ret = WshShell.AppActivate (\"BOINC Manager\" ) if ret = false then WshShell.logevent 1, \"Could not find BOIC to close it!\" WScript.quit -1 end if WScript.Sleep 1000 WshShell.SendKeys \"{F10}{k}{p}~\" WScript.Sleep 1000 WshShell.SendKeys \"{F10}{a}{a}~\" WScript.Sleep 10000 ret = WshShell.AppActivate (\"BOINC Manager\" ) if ret = true then WshShell.logevent 1, \"BOIC is still running after attempt to close it!\" WScript.quit -1 end if REM Backup BOINC MyTime = Time MyTime= Replace(MyTime, \":\", \"_\") BackupName = \"C:\\BOINC_Backups\\BOINC_Backup_\"& Date & \"_\" & MyTime BackupCommand = \"ntbackup backup c:\\program\\BOINC /J \"\"BOINC Backup\"\" /F \"\"\"+BackupName + \"\"\"\" rem MsgBox BackupCommand WshShell.Run BackupCommand,1,false WScript.Sleep 180000 ret = WshShell.AppActivate (\"Säkerhetskopiering\" ) i=0 while ret = true i = i+1 if i = 30 then WshShell.logevent 1, \"Ntbackup still running. Not restarting BOINC!\" WScript.quit -1 end if WScript.Sleep 30000 ret = WshShell.AppActivate (\"Säkerhetskopiering\" ) wend REM Restart BOINC REM set WshShell = WScript.CreateObject(\"WScript.Shell\") WScript.Sleep 2000 WshShell.Run \"boincmgr.exe\",1,false WScript.Sleep 10000 ret = WshShell.AppActivate (\"BOINC Manager\" ) if ret = false then WshShell.logevent 1, \"Could not restart BOINC!\" WScript.quit -1 end if WScript.Sleep 5000 REM run always WshShell.SendKeys \"{F10}{k}{k}~\" WScript.Sleep 5000 REM Turn off network access REM WshShell.SendKeys \"{F10}{k}{f}~\" REM WScript.Sleep 1000 WshShell.logevent 0, \"Successfully completed backup of BOINC folder\" |
Send message Joined: 7 Aug 04 Posts: 2184 Credit: 64,822,615 RAC: 5,275 |
Thanks staffann. I might have to give that a try. |
©2024 cpdn.org