Questions and Answers :
Windows :
Daily Model Crashes
Message board moderation
Author | Message |
---|---|
Send message Joined: 4 Apr 05 Posts: 6 Credit: 84,868 RAC: 0 |
I started with version 4.19 and ran for a year without much problem. Some models did not finish but I gathered that was a component of the result possibilities. This year I upgraded to 5.2.13 and began getting the \'no finished file\' error every few hours. I ignored them as suggested but stopped getting results. I laid off for a few months waiting for a new version as I read in a bug report that this bug was fixed for a future release. Recently I upgraded to 5.4.9 and picked up annoying network traffic alarms/notices by ZoneAlarm in addition to regular \'no finished file\' errors and models crashing interday. I installed Boinc as a service to get rid of the ZA flashing but still have not been able to get more than a few hours without a crash. I don\'t have MS send/no send dialogs, don\'t use Norton, don\'t game and unload when editing video. I can run PassMark BurnIn forever without errors. Although this system is very mildly overclocked the CPU fan was upgraded, is clean and working, and temperatures are well within range and rise a consistent 5c when a model is running. The network has not had any problems coexistant with crashes, this is Win2000 with no time snych, RAM is not a problem. Here are some log file excerpts. They do not appear different than the others posted. Any (other) suggestions? 2006-04-15 10:41:12 [---] Starting BOINC client version 5.2.13 for windows_intelx86 2006-04-15 10:41:12 [---] libcurl/7.14.0 OpenSSL/0.9.8 zlib/1.2.3 2006-04-15 10:41:12 [---] Data directory: C:\\Program Files\\BOINC 2006-04-15 10:41:12 [---] Processor: 1 AuthenticAMD AMD Athlon(tm) XP 2600+ 2006-04-15 10:41:12 [---] Memory: 1023.48 MB physical, 2.40 GB virtual 2006-04-15 10:41:12 [---] Disk: 24.41 GB total, 10.21 GB free 2006-04-15 10:41:12 [climateprediction.net] Computer ID: 147625; location: home; project prefs: default 2006-04-15 10:41:12 [---] General prefs: from climateprediction.net (last modified 2005-04-05 15:26:30) 2006-04-15 10:41:12 [---] General prefs: no separate prefs for home; using your defaults 2006-04-15 10:41:13 [---] Remote control not allowed; using loopback address 2006-04-15 10:41:14 [climateprediction.net] Resuming computation for result hadcm3lb_51r6_05023534_0 using hadcm3lb version 508 2006-04-15 10:41:45 [climateprediction.net] Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi 2006-04-15 10:41:45 [climateprediction.net] Reason: To send trickle-up message 2006-04-15 10:41:45 [climateprediction.net] Note: not requesting new work or reporting results 2006-04-15 10:41:50 [climateprediction.net] Scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded 2006-04-15 13:38:43 [climateprediction.net] Result hadcm3lb_51r6_05023534_0 exited with zero status but no \'finished\' file 2006-04-15 13:38:43 [climateprediction.net] If this happens repeatedly you may need to reset the project. 2006-04-15 13:38:43 [---] request_reschedule_cpus: process exited 2006-04-15 13:38:43 [climateprediction.net] Restarting result hadcm3lb_51r6_05023534_0 using hadcm3lb version 508 2006-04-15 13:38:46 [climateprediction.net] Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi 2006-04-15 13:38:46 [climateprediction.net] Reason: To send trickle-up message 2006-04-15 13:38:46 [climateprediction.net] Note: not requesting new work or reporting results 2006-04-15 13:38:50 [climateprediction.net] Scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded 2006-04-15 13:38:50 [climateprediction.net] General preferences have been updated 2006-04-15 13:38:50 [---] General prefs: from climateprediction.net (last modified 2006-04-15 10:39:27) 2006-04-15 13:38:50 [---] General prefs: no separate prefs for home; using your defaults 2006-04-15 13:39:42 [climateprediction.net] Result hadcm3lb_51r6_05023534_0 exited with zero status but no \'finished\' file 2006-04-15 13:39:42 [climateprediction.net] If this happens repeatedly you may need to reset the project. 2006-04-15 13:39:42 [---] request_reschedule_cpus: process exited 2006-04-15 13:39:42 [climateprediction.net] Restarting result hadcm3lb_51r6_05023534_0 using hadcm3lb version 508 2006-04-15 16:38:27 [climateprediction.net] Result hadcm3lb_51r6_05023534_0 exited with zero status but no \'finished\' file 2006-04-15 16:38:27 [climateprediction.net] If this happens repeatedly you may need to reset the project. 2006-04-15 16:38:27 [---] request_reschedule_cpus: process exited 2006-04-15 16:38:27 [climateprediction.net] Restarting result hadcm3lb_51r6_05023534_0 using hadcm3lb version 508 2006-04-15 16:38:35 [climateprediction.net] Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi 2006-04-15 16:38:35 [climateprediction.net] Reason: To send trickle-up message 2006-04-15 16:38:35 [climateprediction.net] Note: not requesting new work or reporting results 2006-04-15 16:38:39 [climateprediction.net] Scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded 2006-04-15 17:47:11 [---] request_reschedule_cpus: process exited 2006-04-15 17:47:11 [climateprediction.net] Computation for result hadcm3lb_51r6_05023534_0 finished 2006-04-15 17:47:12 [climateprediction.net] Unrecoverable error for result hadcm3lb_51r6_05023534_0 (<file_xfer_error> <file_name>hadcm3lb_51r6_05023534_0_2.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lb_51r6_05023534_0_3.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lb_51r6_05023534_0_4.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lb_51r6_05023534_0_5.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lb_51r6_05023534_0_6.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lb_51r6_05023534_0_7.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lb_51r6_05023534_0_8.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lb_51r6_05023534_0_9.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lb_51r6_05023534_0_10.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lb_51r6_05023534_0_11.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lb_51r6_05023534_0_12.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lb_51r6_05023534_0_13.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lb_51r6_05023534_0_14.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lb_51r6_05023534_0_15.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lb_51r6_05023534_0_16.zip</file_name> <error_code>-161</error_code> <error_message></error_message> This is a lifecycle of an attempted project: 2006-07-15 21:00:00 [---] Resuming network activity 2006-07-15 21:00:01 [climateprediction.net] Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi 2006-07-15 21:00:01 [climateprediction.net] Reason: To fetch work 2006-07-15 21:00:01 [climateprediction.net] Requesting 432000 seconds of new work, and reporting 1 completed tasks 2006-07-15 21:00:06 [climateprediction.net] Scheduler request succeeded 2006-07-15 21:00:08 [climateprediction.net] Started download of file hadcm3lbm_bbma_25297541.zip 2006-07-15 21:00:08 [climateprediction.net] Started download of file solar_v14.gz 2006-07-15 21:00:10 [climateprediction.net] Finished download of file hadcm3lbm_bbma_25297541.zip 2006-07-15 21:00:10 [climateprediction.net] Throughput 37309 bytes/sec 2006-07-15 21:00:10 [climateprediction.net] Finished download of file solar_v14.gz 2006-07-15 21:00:10 [climateprediction.net] Throughput 2424 bytes/sec 2006-07-15 21:00:10 [climateprediction.net] Started download of file hdjrhdck_0208_nickfluxcorr.anc.gz 2006-07-15 21:00:10 [climateprediction.net] Started download of file 1040_flux_corr.anc.gz 2006-07-15 21:00:15 [climateprediction.net] Finished download of file hdjrhdck_0208_nickfluxcorr.anc.gz 2006-07-15 21:00:15 [climateprediction.net] Throughput 132938 bytes/sec 2006-07-15 21:00:15 [climateprediction.net] Finished download of file 1040_flux_corr.anc.gz 2006-07-15 21:00:15 [climateprediction.net] Throughput 123696 bytes/sec 2006-07-15 21:00:15 [climateprediction.net] Started download of file volc_v30.gz 2006-07-15 21:00:15 [climateprediction.net] Started download of file 1040_ocean.year.gz 2006-07-15 21:00:17 [climateprediction.net] Finished download of file volc_v30.gz 2006-07-15 21:00:17 [climateprediction.net] Throughput 23899 bytes/sec 2006-07-15 21:00:26 [climateprediction.net] Finished download of file 1040_ocean.year.gz 2006-07-15 21:00:26 [climateprediction.net] Throughput 137787 bytes/sec 2006-07-15 21:00:27 [---] Rescheduling CPU: files downloaded 2006-07-15 21:00:27 [climateprediction.net] Starting task hadcm3lbm_bbma_25297541_0 using hadcm3lb version 508 2006-07-15 21:13:54 [climateprediction.net] Task hadcm3lbm_bbma_25297541_0 exited with zero status but no \'finished\' file 2006-07-15 21:13:54 [climateprediction.net] If this happens repeatedly you may need to reset the project. 2006-07-15 21:13:54 [---] Rescheduling CPU: application exited 2006-07-15 21:13:54 [climateprediction.net] Restarting task hadcm3lbm_bbma_25297541_0 using hadcm3lb version 508 2006-07-15 23:12:40 [climateprediction.net] Task hadcm3lbm_bbma_25297541_0 exited with zero status but no \'finished\' file 2006-07-15 23:12:40 [climateprediction.net] If this happens repeatedly you may need to reset the project. 2006-07-15 23:12:40 [---] Rescheduling CPU: application exited 2006-07-15 23:12:40 [climateprediction.net] Restarting task hadcm3lbm_bbma_25297541_0 using hadcm3lb version 508 2006-07-16 01:11:17 [climateprediction.net] Task hadcm3lbm_bbma_25297541_0 exited with zero status but no \'finished\' file 2006-07-16 01:11:17 [climateprediction.net] If this happens repeatedly you may need to reset the project. 2006-07-16 01:11:17 [---] Rescheduling CPU: application exited 2006-07-16 01:11:17 [climateprediction.net] Restarting task hadcm3lbm_bbma_25297541_0 using hadcm3lb version 508 2006-07-16 03:10:00 [climateprediction.net] Task hadcm3lbm_bbma_25297541_0 exited with zero status but no \'finished\' file 2006-07-16 03:10:00 [climateprediction.net] If this happens repeatedly you may need to reset the project. 2006-07-16 03:10:00 [---] Rescheduling CPU: application exited 2006-07-16 03:10:00 [climateprediction.net] Restarting task hadcm3lbm_bbma_25297541_0 using hadcm3lb version 508 2006-07-16 04:00:00 [---] Suspending network activity - time of day 2006-07-16 05:08:43 [climateprediction.net] Task hadcm3lbm_bbma_25297541_0 exited with zero status but no \'finished\' file 2006-07-16 05:08:43 [climateprediction.net] If this happens repeatedly you may need to reset the project. 2006-07-16 05:08:43 [---] Rescheduling CPU: application exited 2006-07-16 05:08:43 [climateprediction.net] Restarting task hadcm3lbm_bbma_25297541_0 using hadcm3lb version 508 2006-07-16 07:07:25 [climateprediction.net] Task hadcm3lbm_bbma_25297541_0 exited with zero status but no \'finished\' file 2006-07-16 07:07:25 [climateprediction.net] If this happens repeatedly you may need to reset the project. 2006-07-16 07:07:25 [---] Rescheduling CPU: application exited 2006-07-16 07:07:25 [climateprediction.net] Restarting task hadcm3lbm_bbma_25297541_0 using hadcm3lb version 508 2006-07-16 09:06:02 [climateprediction.net] Task hadcm3lbm_bbma_25297541_0 exited with zero status but no \'finished\' file 2006-07-16 09:06:02 [climateprediction.net] If this happens repeatedly you may need to reset the project. 2006-07-16 09:06:02 [---] Rescheduling CPU: application exited 2006-07-16 09:06:02 [climateprediction.net] Restarting task hadcm3lbm_bbma_25297541_0 using hadcm3lb version 508 2006-07-16 09:32:58 [climateprediction.net] Task hadcm3lbm_bbma_25297541_0 exited with zero status but no \'finished\' file 2006-07-16 09:32:58 [climateprediction.net] If this happens repeatedly you may need to reset the project. 2006-07-16 09:32:58 [---] Rescheduling CPU: application exited 2006-07-16 09:32:58 [climateprediction.net] Restarting task hadcm3lbm_bbma_25297541_0 using hadcm3lb version 508 2006-07-16 11:04:38 [climateprediction.net] Task hadcm3lbm_bbma_25297541_0 exited with zero status but no \'finished\' file 2006-07-16 11:04:38 [climateprediction.net] If this happens repeatedly you may need to reset the project. 2006-07-16 11:04:38 [---] Rescheduling CPU: application exited 2006-07-16 11:04:38 [climateprediction.net] Restarting task hadcm3lbm_bbma_25297541_0 using hadcm3lb version 508 2006-07-16 13:03:15 [climateprediction.net] Task hadcm3lbm_bbma_25297541_0 exited with zero status but no \'finished\' file 2006-07-16 13:03:15 [climateprediction.net] If this happens repeatedly you may need to reset the project. 2006-07-16 13:03:15 [---] Rescheduling CPU: application exited 2006-07-16 13:03:15 [climateprediction.net] Restarting task hadcm3lbm_bbma_25297541_0 using hadcm3lb version 508 2006-07-16 15:01:54 [climateprediction.net] Task hadcm3lbm_bbma_25297541_0 exited with zero status but no \'finished\' file 2006-07-16 15:01:54 [climateprediction.net] If this happens repeatedly you may need to reset the project. 2006-07-16 15:01:54 [---] Rescheduling CPU: application exited 2006-07-16 15:01:54 [climateprediction.net] Restarting task hadcm3lbm_bbma_25297541_0 using hadcm3lb version 508 To pause/resume tasks hit CTRL-C, to exit hit CTRL-BREAK StartServiceCtrlDispatcher being called. This may take several seconds. Please wait. 2006-07-16 16:52:09 [---] Starting BOINC client version 5.4.9 for windows_intelx86 2006-07-16 16:52:09 [---] libcurl/7.15.3 OpenSSL/0.9.8a zlib/1.2.3 2006-07-16 16:52:09 [---] Executing as a daemon 2006-07-16 16:52:09 [---] Data directory: C:\\Program Files\\BOINC 2006-07-16 16:52:09 [---] BOINC is running as a service and as a non-system user. 2006-07-16 16:52:09 [---] No application graphics will be available. 2006-07-16 16:52:10 [---] Processor: 1 AuthenticAMD AMD Athlon(tm) XP 2600+ 2006-07-16 16:52:10 [---] Memory: 1023.48 MB physical, 2.40 GB virtual 2006-07-16 16:52:10 [---] Disk: 24.41 GB total, 12.04 GB free 2006-07-16 16:52:10 [climateprediction.net] URL: http://climateprediction.net/; Computer ID: 147625; location: home; project prefs: default 2006-07-16 16:52:10 [---] General prefs: from climateprediction.net (last modified 2006-07-13 11:09:55) 2006-07-16 16:52:10 [---] General prefs: no separate prefs for home; using your defaults 2006-07-16 16:52:10 [---] Local control only allowed 2006-07-16 16:52:10 [---] Listening on port 31416 2006-07-16 16:52:10 [climateprediction.net] Resuming task hadcm3lbm_bbma_25297541_0 using hadcm3lb version 508 2006-07-16 16:52:11 [---] Suspending network activity - time of day 2006-07-16 17:12:12 [---] Rescheduling CPU: application exited 2006-07-16 17:12:12 [climateprediction.net] Computation for task hadcm3lbm_bbma_25297541_0 finished 2006-07-16 17:12:13 [climateprediction.net] Unrecoverable error for result hadcm3lbm_bbma_25297541_0 (<file_xfer_error> <file_name>hadcm3lbm_bbma_25297541_0_1.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lbm_bbma_25297541_0_2.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lbm_bbma_25297541_0_3.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lbm_bbma_25297541_0_4.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lbm_bbma_25297541_0_5.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lbm_bbma_25297541_0_6.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lbm_bbma_25297541_0_7.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lbm_bbma_25297541_0_8.zip</file_name> <error |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Hi, It is worthwhile looking through the \'solutions to models crashing\' post at the top of the forum, to see if there is anything useful there. The stability check I would recommend is Prime95\'s Torture test for 24 hours - because the climate model runs for so long, shorter tests aren\'t a guarantee of stability. The ZoneAlarm flashing problem is because ZoneAlarm incorrectly sees local traffic on your machine as internet traffic (the manager and the client talk to each other via a TCP/IP connection to \'localhost\'). The -161 error you\'re experiencing is difficult to diagnose, since it simply means \'the model has crashed, and I have nothing to upload as a result of this\'. Are there any files called something like stdout_err2.txt or something like that in the work unit\'s directory? Do you have an account on the www.climateprediction.net forum? (requires separate registration). I\'d like to send you a PM, but that facility isn\'t available here. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 4 Apr 05 Posts: 6 Credit: 84,868 RAC: 0 |
I have been through the sticky post, in April and again now. Did I miss any? I don\'t run the graphics or screensaver. If you think Prime95 is materially different than PassMark BurnIn, OK, I will run it. There is no error file specific to a project. Actually, since the upgrade(s) the program is not creating separate subdirectories under ..boinc\\projects\\climateprediciton.net.. Previous runs created subdirectories filled with output .zip files. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The first one in your list is easy: hadcm3lb_51r6_05023534_0 is one of the original, faulty models. As for the 2nd, that\'s a lot harder to work out. I\'d suggest that you join up on the Message boards, as Mike says. There\'re people who visit there, but not here who may be able to help, and Mike can send you the private message. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The new type of model \'works\' differently to previously. Data is upload during the modelling, and no files are kept at the end. So the directory structure is different. But there will be a folder with the model\'s name. And the \'torture test\' part of Prime95 is supposed to be the toughest test for a computer short of the climate program. |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
... I\'m not familiar with PassMark BurnIn (Prime95\'s test is pretty tough, but I don\'t know about BurnIn), the main point is the long duration run. If you ran BurnIn for 24 hours it\'d probably be an equal test to Prime95, but I\'m guessing to some extent. I recently fiddled with the overclocking settings on my own PC - the first test failed after 23 hours, so I dropped the overclock slightly and reran for 27 hours without error. Running the climate model is a bit like running a stability test for 2,000 hours ... ! Just to be clearer about the forum comment in my earlier post, I meant the \'discussion boards\' http://www.climateprediction.net/board/index.php rather than these ones :-) I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 4 Apr 05 Posts: 6 Credit: 84,868 RAC: 0 |
The new type of model \'works\' differently to previously. Data is upload during the modelling, and no files are kept at the end. So the directory structure is different. Right now I do not have any subdirectories with model names. I have .zip files with model names or parts of model names. Sorry about the error file exerpt. I was sloppy with what I picked. I was aware of the global kill. But other before and after were the same. I am not too worried about some \'client errors\' since looking at other participant results shows a bunch of them. But I am now getting a model, running for 6 or 8 hours, generating 3 or 4 \'exit with no file\' then \'finished\' and no result....get a new model. I don\'t think I am helping much. Although I don\'t think this box has a stability or temperature problem I am going to get Prime95 and run it for 24 hours to see what it says. I have not run such a test since I put it together and actually burned it in. My latest run was about 8 hours, no errors. But it only added 3c to the temp where climate adds 5c. So, maybe the box cannot hack it at the edges, my previous results not withstanding. Thanks everyone. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
From Mike\'s \'help\' post: * Windows \'time sync\' messages have been mentioned recently as causing \'process exited with zero status\' crashes. Although these are relatively benign, it may be worth trying to reduce their frequency. This message only appears on some computers, and may have been made worse, for you, by a recent Windows update. There are two options here: 1) If you do nothing the model MAY crash, which will give you a new one. 2) If you follow the directions, it will abort the model, and give you a new one, as well as the latest set of programs. So, I\'d suggest that you click the \'No new work\' option in the Projects tab, make a backup regularly, and ignore the messages. And make the first backup as soon as you have reached the first checkpoint. (And good luck.) |
Send message Joined: 4 Apr 05 Posts: 6 Credit: 84,868 RAC: 0 |
From Mike\'s \'help\' post: Thanks. I am not running Windows Time Synch. I gather that everyone does not have this problem (zero status) and there is no cure for those who do. I have been ignoring the messages but don\'t get more than 6 or 8 hours of work before a fatal error and the model crashes. Then, it waits a day for the quota to lapse, gets a new model, and repeats. For the time being I guess I will sit on the sidelines. |
©2024 cpdn.org