Questions and Answers :
Windows :
I\'m back and I just had my first crash...shall I just restore a backup?
Message board moderation
Author | Message |
---|---|
Send message Joined: 17 Feb 06 Posts: 89 Credit: 4,309,159 RAC: 0 |
OK I have migrated from the BBC project to CPDN. My last model took 5000hrs over 10 months, my new machine is much better and I left it running BOINC 24/7 sweetly while I have been migrating data to it. But this morning a model crashed with this message: 19/03/2007 14:27:48||Starting BOINC client version 5.8.15 for windows_intelx86 19/03/2007 14:27:48||log flags: task, file_xfer, sched_ops 19/03/2007 14:27:48||Libraries: libcurl/7.16.0 OpenSSL/0.9.8a zlib/1.2.3 19/03/2007 14:27:48||Executing as a daemon 19/03/2007 14:27:48||Data directory: C:\\Program Files\\BOINC 19/03/2007 14:27:48||BOINC is running as a service and as a non-system user. 19/03/2007 14:27:48||No application graphics will be available. 19/03/2007 14:27:48||Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz [x86 Family 6 Model 15 Stepping 6] [fpu tsc pae nx sse sse2 mmx] 19/03/2007 14:27:48||Memory: 2.00 GB physical, 3.85 GB virtual 19/03/2007 14:27:48||Disk: 298.08 GB total, 281.77 GB free 19/03/2007 14:27:48|climateprediction.net|URL: http://climateprediction.net/; Computer ID: 576104; location: (none); project prefs: default 19/03/2007 14:27:48||No general preferences found - using BOINC defaults 19/03/2007 14:27:48|climateprediction.net|Restarting task hadcm3pbb_c6xt_05842940_0 using hadcm3 version 515 19/03/2007 14:27:48|climateprediction.net|Restarting task hadcm3ohe_1pku_05728144_1 using hadcm3 version 515 19/03/2007 19:15:55|climateprediction.net|Sending scheduler request: To send trickle-up message 19/03/2007 19:15:55|climateprediction.net|(not requesting new work or reporting completed tasks) 19/03/2007 19:16:01|climateprediction.net|Scheduler RPC succeeded [server version 509] 19/03/2007 20:40:11|climateprediction.net|Sending scheduler request: To send trickle-up message 19/03/2007 20:40:11|climateprediction.net|(not requesting new work or reporting completed tasks) 19/03/2007 20:40:16|climateprediction.net|Scheduler RPC succeeded [server version 509] 20/03/2007 08:31:09|climateprediction.net|Sending scheduler request: To send trickle-up message 20/03/2007 08:31:09|climateprediction.net|(not requesting new work or reporting completed tasks) 20/03/2007 08:31:14|climateprediction.net|Scheduler RPC succeeded [server version 509] 20/03/2007 09:54:14|climateprediction.net|Sending scheduler request: To send trickle-up message 20/03/2007 09:54:14|climateprediction.net|(not requesting new work or reporting completed tasks) 20/03/2007 09:54:19|climateprediction.net|Scheduler RPC succeeded [server version 509] 21/03/2007 02:05:36|climateprediction.net|Sending scheduler request: To send trickle-up message 21/03/2007 02:05:36|climateprediction.net|(not requesting new work or reporting completed tasks) 21/03/2007 02:05:41|climateprediction.net|Scheduler RPC succeeded [server version 509] 21/03/2007 16:04:30|climateprediction.net|Sending scheduler request: To send trickle-up message 21/03/2007 16:04:30|climateprediction.net|(not requesting new work or reporting completed tasks) 21/03/2007 16:04:35|climateprediction.net|Scheduler RPC succeeded [server version 509] 21/03/2007 17:24:06|climateprediction.net|Sending scheduler request: To send trickle-up message 21/03/2007 17:24:06|climateprediction.net|(not requesting new work or reporting completed tasks) 21/03/2007 17:24:11|climateprediction.net|Scheduler RPC succeeded [server version 509] 22/03/2007 05:18:36|climateprediction.net|Sending scheduler request: To send trickle-up message 22/03/2007 05:18:36|climateprediction.net|(not requesting new work or reporting completed tasks) 22/03/2007 05:18:41|climateprediction.net|Scheduler RPC succeeded [server version 509] 22/03/2007 06:40:41|climateprediction.net|Sending scheduler request: To send trickle-up message 22/03/2007 06:40:41|climateprediction.net|(not requesting new work or reporting completed tasks) 22/03/2007 06:40:46|climateprediction.net|Scheduler RPC succeeded [server version 509] 22/03/2007 08:57:18||Suspending computation - user request 22/03/2007 09:17:38||Resuming computation 22/03/2007 19:11:41|climateprediction.net|Sending scheduler request: To send trickle-up message 22/03/2007 19:11:41|climateprediction.net|(not requesting new work or reporting completed tasks) 22/03/2007 19:11:46|climateprediction.net|Scheduler RPC succeeded [server version 509] 22/03/2007 20:32:46|climateprediction.net|Sending scheduler request: To send trickle-up message 22/03/2007 20:32:46|climateprediction.net|(not requesting new work or reporting completed tasks) 22/03/2007 20:32:51|climateprediction.net|Scheduler RPC succeeded [server version 509] 23/03/2007 08:34:36|climateprediction.net|Sending scheduler request: To send trickle-up message 23/03/2007 08:34:36|climateprediction.net|(not requesting new work or reporting completed tasks) 23/03/2007 08:34:41|climateprediction.net|Scheduler RPC succeeded [server version 509] 23/03/2007 09:04:34|climateprediction.net|Computation for task hadcm3pbb_c6xt_05842940_0 finished 23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_2.zip for task hadcm3pbb_c6xt_05842940_0 absent 23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_3.zip for task hadcm3pbb_c6xt_05842940_0 absent 23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_4.zip for task hadcm3pbb_c6xt_05842940_0 absent 23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_5.zip for task hadcm3pbb_c6xt_05842940_0 absent 23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_6.zip for task hadcm3pbb_c6xt_05842940_0 absent 23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_7.zip for task hadcm3pbb_c6xt_05842940_0 absent 23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_8.zip for task hadcm3pbb_c6xt_05842940_0 absent 23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_9.zip for task hadcm3pbb_c6xt_05842940_0 absent 23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_10.zip for task hadcm3pbb_c6xt_05842940_0 absent 23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_11.zip for task hadcm3pbb_c6xt_05842940_0 absent 23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_12.zip for task hadcm3pbb_c6xt_05842940_0 absent 23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_13.zip for task hadcm3pbb_c6xt_05842940_0 absent 23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_14.zip for task hadcm3pbb_c6xt_05842940_0 absent 23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_15.zip for task hadcm3pbb_c6xt_05842940_0 absent 23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_16.zip for task hadcm3pbb_c6xt_05842940_0 absent 23/03/2007 09:04:35|climateprediction.net|Deferring communication for 1 min 0 sec 23/03/2007 09:04:35|climateprediction.net|Reason: Unrecoverable error for result hadcm3pbb_c6xt_05842940_0 (<file_xfer_error> <file_name>hadcm3pbb_c6xt_05842940_0_2.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c6xt_05842940_0_3.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c6xt_05842940_0_4.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c6xt_05842940_0_5.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c6xt_05842940_0_6.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c6xt_05842940_0_7.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c6xt_05842940_0_8.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c6xt_05842940_0_9.zip</file_name> 23/03/2007 09:05:36|climateprediction.net|Sending scheduler request: To fetch work 23/03/2007 09:05:36|climateprediction.net|Requesting 8640 seconds of new work, and reporting 1 completed tasks 23/03/2007 09:05:41|climateprediction.net|Scheduler RPC succeeded [server version 509] 23/03/2007 09:05:43|climateprediction.net|[file_xfer] Started download of file hadcm3pbb_c58o_05840739.zip 23/03/2007 09:05:43|climateprediction.net|[file_xfer] Started download of file fnw5hdck_0208_nickfluxcorr.anc.gz 23/03/2007 09:05:45|climateprediction.net|[file_xfer] Finished download of file hadcm3pbb_c58o_05840739.zip 23/03/2007 09:05:45|climateprediction.net|[file_xfer] Throughput 128219 bytes/sec 23/03/2007 09:05:45|climateprediction.net|[file_xfer] Started download of file 1040_flux_corr.anc.gz 23/03/2007 09:05:48|climateprediction.net|[file_xfer] Finished download of file fnw5hdck_0208_nickfluxcorr.anc.gz 23/03/2007 09:05:48|climateprediction.net|[file_xfer] Throughput 109727 bytes/sec 23/03/2007 09:05:48|climateprediction.net|[file_xfer] Started download of file volc_v30.gz 23/03/2007 09:05:49|climateprediction.net|[file_xfer] Finished download of file volc_v30.gz 23/03/2007 09:05:49|climateprediction.net|[file_xfer] Throughput 12937 bytes/sec 23/03/2007 09:05:49|climateprediction.net|[file_xfer] Started download of file 1040_ocean.year.gz 23/03/2007 09:05:55|climateprediction.net|[file_xfer] Finished download of file 1040_flux_corr.anc.gz 23/03/2007 09:05:55|climateprediction.net|[file_xfer] Throughput 46932 bytes/sec 23/03/2007 09:06:06|climateprediction.net|[file_xfer] Finished download of file 1040_ocean.year.gz 23/03/2007 09:06:06|climateprediction.net|[file_xfer] Throughput 84329 bytes/sec 23/03/2007 09:06:07|climateprediction.net|Starting hadcm3pbb_c58o_05840739_0 23/03/2007 09:06:07|climateprediction.net|Starting task hadcm3pbb_c58o_05840739_0 using hadcm3 version 515 I aborted the new download and then Boinc promptly downloaded another! So I aborted that and stopped net activity. So I have one model left. My gut feeling is to restore Boinc back from my last backup of 2% less. Any better suggestions? Also should I run with Network activity disabled...? Thanks Digby |
Send message Joined: 9 Jan 07 Posts: 497 Credit: 342,899 RAC: 0 |
\"Executing as a daemon\" Never seen that one before! Am sure someone will be along to advise you shortly, Digby - good luck! MM @ the Pavilion |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
\'Executing as a daemon\' means that Boinc runs in the background rather than via the Boinc manager (AKA the service install). The reason the model crashed was this: Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA Fatal crash! :-( It indicates one of two things. A) Most likely, the mix of parameters in the original model leads to a physically impossible climate. The model is designed to shut itself down if this occurs. One of the main aims of the project is to work out which combinations of starting parameters are viable and which are not. B) Less likely, the PC is generating dodgy floating point calculations due to something like overclocking or overheating. This is unlikely if you\'ve run other climate models on the same PC. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 17 Feb 06 Posts: 89 Credit: 4,309,159 RAC: 0 |
The reason the model crashed was this: Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA Fatal crash! :-( It indicates one of two things. A) Most likely, the mix of parameters in the original model leads to a physically impossible climate. The model is designed to shut itself down if this occurs. One of the main aims of the project is to work out which combinations of starting parameters are viable and which are not. B) Less likely, the PC is generating dodgy floating point calculations due to something like overclocking or overheating. This is unlikely if you\'ve run other climate models on the same PC. OK so is it worth restoring my backup which would re-start both models at approx 8.9% or shall I let my remaining model continue from 10.56% and just download a new one? If I restart at 8.9% it will be interesting to see if the dodgy model fails again at the same point... One of the main aims of the project is to work out which combinations of starting parameters are viable and which are not. So presumably the project managers analyse starting parameters with model failures...interesting. Cheers |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
What I would suggest is running Prime 95\'s Torture Test for 24 hours or so on the PC. If it passes (it probably will), then there\'s no point in restoring from the backup because it will just hit the same thing again. If if fails, then I\'d suggest trying to resolve whatever problem it identifies, and then resuming from the backup.
Yes, this is a big focus of their work I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 17 Feb 06 Posts: 89 Credit: 4,309,159 RAC: 0 |
OK, I just resumed running again and a new model was downloaded at midnight after a six hour delay which the server insisted upon. So I\'m continuing the journey to \'infinity and beyond\' as Buz would say. Thanks for your assistance. Digby |
©2024 cpdn.org