climateprediction.net home page
I\'m back and I just had my first crash...shall I just restore a backup?

I\'m back and I just had my first crash...shall I just restore a backup?

Questions and Answers : Windows : I\'m back and I just had my first crash...shall I just restore a backup?
Message board moderation

To post messages, you must log in.

AuthorMessage
Digby

Send message
Joined: 17 Feb 06
Posts: 89
Credit: 4,309,159
RAC: 0
Message 27458 - Posted: 23 Mar 2007, 10:02:02 UTC

OK I have migrated from the BBC project to CPDN. My last model took 5000hrs over 10 months, my new machine is much better and I left it running BOINC 24/7 sweetly while I have been migrating data to it.
But this morning a model crashed with this message:

19/03/2007 14:27:48||Starting BOINC client version 5.8.15 for windows_intelx86
19/03/2007 14:27:48||log flags: task, file_xfer, sched_ops
19/03/2007 14:27:48||Libraries: libcurl/7.16.0 OpenSSL/0.9.8a zlib/1.2.3
19/03/2007 14:27:48||Executing as a daemon
19/03/2007 14:27:48||Data directory: C:\\Program Files\\BOINC
19/03/2007 14:27:48||BOINC is running as a service and as a non-system user.
19/03/2007 14:27:48||No application graphics will be available.
19/03/2007 14:27:48||Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz [x86 Family 6 Model 15 Stepping 6] [fpu tsc pae nx sse sse2 mmx]
19/03/2007 14:27:48||Memory: 2.00 GB physical, 3.85 GB virtual
19/03/2007 14:27:48||Disk: 298.08 GB total, 281.77 GB free
19/03/2007 14:27:48|climateprediction.net|URL: http://climateprediction.net/; Computer ID: 576104; location: (none); project prefs: default
19/03/2007 14:27:48||No general preferences found - using BOINC defaults
19/03/2007 14:27:48|climateprediction.net|Restarting task hadcm3pbb_c6xt_05842940_0 using hadcm3 version 515
19/03/2007 14:27:48|climateprediction.net|Restarting task hadcm3ohe_1pku_05728144_1 using hadcm3 version 515
19/03/2007 19:15:55|climateprediction.net|Sending scheduler request: To send trickle-up message
19/03/2007 19:15:55|climateprediction.net|(not requesting new work or reporting completed tasks)
19/03/2007 19:16:01|climateprediction.net|Scheduler RPC succeeded [server version 509]
19/03/2007 20:40:11|climateprediction.net|Sending scheduler request: To send trickle-up message
19/03/2007 20:40:11|climateprediction.net|(not requesting new work or reporting completed tasks)
19/03/2007 20:40:16|climateprediction.net|Scheduler RPC succeeded [server version 509]
20/03/2007 08:31:09|climateprediction.net|Sending scheduler request: To send trickle-up message
20/03/2007 08:31:09|climateprediction.net|(not requesting new work or reporting completed tasks)
20/03/2007 08:31:14|climateprediction.net|Scheduler RPC succeeded [server version 509]
20/03/2007 09:54:14|climateprediction.net|Sending scheduler request: To send trickle-up message
20/03/2007 09:54:14|climateprediction.net|(not requesting new work or reporting completed tasks)
20/03/2007 09:54:19|climateprediction.net|Scheduler RPC succeeded [server version 509]
21/03/2007 02:05:36|climateprediction.net|Sending scheduler request: To send trickle-up message
21/03/2007 02:05:36|climateprediction.net|(not requesting new work or reporting completed tasks)
21/03/2007 02:05:41|climateprediction.net|Scheduler RPC succeeded [server version 509]
21/03/2007 16:04:30|climateprediction.net|Sending scheduler request: To send trickle-up message
21/03/2007 16:04:30|climateprediction.net|(not requesting new work or reporting completed tasks)
21/03/2007 16:04:35|climateprediction.net|Scheduler RPC succeeded [server version 509]
21/03/2007 17:24:06|climateprediction.net|Sending scheduler request: To send trickle-up message
21/03/2007 17:24:06|climateprediction.net|(not requesting new work or reporting completed tasks)
21/03/2007 17:24:11|climateprediction.net|Scheduler RPC succeeded [server version 509]
22/03/2007 05:18:36|climateprediction.net|Sending scheduler request: To send trickle-up message
22/03/2007 05:18:36|climateprediction.net|(not requesting new work or reporting completed tasks)
22/03/2007 05:18:41|climateprediction.net|Scheduler RPC succeeded [server version 509]
22/03/2007 06:40:41|climateprediction.net|Sending scheduler request: To send trickle-up message
22/03/2007 06:40:41|climateprediction.net|(not requesting new work or reporting completed tasks)
22/03/2007 06:40:46|climateprediction.net|Scheduler RPC succeeded [server version 509]
22/03/2007 08:57:18||Suspending computation - user request
22/03/2007 09:17:38||Resuming computation
22/03/2007 19:11:41|climateprediction.net|Sending scheduler request: To send trickle-up message
22/03/2007 19:11:41|climateprediction.net|(not requesting new work or reporting completed tasks)
22/03/2007 19:11:46|climateprediction.net|Scheduler RPC succeeded [server version 509]
22/03/2007 20:32:46|climateprediction.net|Sending scheduler request: To send trickle-up message
22/03/2007 20:32:46|climateprediction.net|(not requesting new work or reporting completed tasks)
22/03/2007 20:32:51|climateprediction.net|Scheduler RPC succeeded [server version 509]
23/03/2007 08:34:36|climateprediction.net|Sending scheduler request: To send trickle-up message
23/03/2007 08:34:36|climateprediction.net|(not requesting new work or reporting completed tasks)
23/03/2007 08:34:41|climateprediction.net|Scheduler RPC succeeded [server version 509]
23/03/2007 09:04:34|climateprediction.net|Computation for task hadcm3pbb_c6xt_05842940_0 finished
23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_2.zip for task hadcm3pbb_c6xt_05842940_0 absent
23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_3.zip for task hadcm3pbb_c6xt_05842940_0 absent
23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_4.zip for task hadcm3pbb_c6xt_05842940_0 absent
23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_5.zip for task hadcm3pbb_c6xt_05842940_0 absent
23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_6.zip for task hadcm3pbb_c6xt_05842940_0 absent
23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_7.zip for task hadcm3pbb_c6xt_05842940_0 absent
23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_8.zip for task hadcm3pbb_c6xt_05842940_0 absent
23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_9.zip for task hadcm3pbb_c6xt_05842940_0 absent
23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_10.zip for task hadcm3pbb_c6xt_05842940_0 absent
23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_11.zip for task hadcm3pbb_c6xt_05842940_0 absent
23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_12.zip for task hadcm3pbb_c6xt_05842940_0 absent
23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_13.zip for task hadcm3pbb_c6xt_05842940_0 absent
23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_14.zip for task hadcm3pbb_c6xt_05842940_0 absent
23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_15.zip for task hadcm3pbb_c6xt_05842940_0 absent
23/03/2007 09:04:34|climateprediction.net|Output file hadcm3pbb_c6xt_05842940_0_16.zip for task hadcm3pbb_c6xt_05842940_0 absent
23/03/2007 09:04:35|climateprediction.net|Deferring communication for 1 min 0 sec
23/03/2007 09:04:35|climateprediction.net|Reason: Unrecoverable error for result hadcm3pbb_c6xt_05842940_0 (<file_xfer_error> <file_name>hadcm3pbb_c6xt_05842940_0_2.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c6xt_05842940_0_3.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c6xt_05842940_0_4.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c6xt_05842940_0_5.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c6xt_05842940_0_6.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c6xt_05842940_0_7.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c6xt_05842940_0_8.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c6xt_05842940_0_9.zip</file_name>
23/03/2007 09:05:36|climateprediction.net|Sending scheduler request: To fetch work
23/03/2007 09:05:36|climateprediction.net|Requesting 8640 seconds of new work, and reporting 1 completed tasks
23/03/2007 09:05:41|climateprediction.net|Scheduler RPC succeeded [server version 509]
23/03/2007 09:05:43|climateprediction.net|[file_xfer] Started download of file hadcm3pbb_c58o_05840739.zip
23/03/2007 09:05:43|climateprediction.net|[file_xfer] Started download of file fnw5hdck_0208_nickfluxcorr.anc.gz
23/03/2007 09:05:45|climateprediction.net|[file_xfer] Finished download of file hadcm3pbb_c58o_05840739.zip
23/03/2007 09:05:45|climateprediction.net|[file_xfer] Throughput 128219 bytes/sec
23/03/2007 09:05:45|climateprediction.net|[file_xfer] Started download of file 1040_flux_corr.anc.gz
23/03/2007 09:05:48|climateprediction.net|[file_xfer] Finished download of file fnw5hdck_0208_nickfluxcorr.anc.gz
23/03/2007 09:05:48|climateprediction.net|[file_xfer] Throughput 109727 bytes/sec
23/03/2007 09:05:48|climateprediction.net|[file_xfer] Started download of file volc_v30.gz
23/03/2007 09:05:49|climateprediction.net|[file_xfer] Finished download of file volc_v30.gz
23/03/2007 09:05:49|climateprediction.net|[file_xfer] Throughput 12937 bytes/sec
23/03/2007 09:05:49|climateprediction.net|[file_xfer] Started download of file 1040_ocean.year.gz
23/03/2007 09:05:55|climateprediction.net|[file_xfer] Finished download of file 1040_flux_corr.anc.gz
23/03/2007 09:05:55|climateprediction.net|[file_xfer] Throughput 46932 bytes/sec
23/03/2007 09:06:06|climateprediction.net|[file_xfer] Finished download of file 1040_ocean.year.gz
23/03/2007 09:06:06|climateprediction.net|[file_xfer] Throughput 84329 bytes/sec
23/03/2007 09:06:07|climateprediction.net|Starting hadcm3pbb_c58o_05840739_0
23/03/2007 09:06:07|climateprediction.net|Starting task hadcm3pbb_c58o_05840739_0 using hadcm3 version 515

I aborted the new download and then Boinc promptly downloaded another! So I aborted that and stopped net activity. So I have one model left.

My gut feeling is to restore Boinc back from my last backup of 2% less. Any better suggestions? Also should I run with Network activity disabled...?

Thanks Digby
ID: 27458 · Report as offensive     Reply Quote
Profile Strathpeffer
Avatar

Send message
Joined: 9 Jan 07
Posts: 497
Credit: 342,899
RAC: 0
Message 27459 - Posted: 23 Mar 2007, 12:36:48 UTC
Last modified: 23 Mar 2007, 12:37:05 UTC

\"Executing as a daemon\" Never seen that one before! Am sure someone will be along to advise you shortly, Digby - good luck!

MM @ the Pavilion
ID: 27459 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 27462 - Posted: 23 Mar 2007, 14:48:14 UTC

\'Executing as a daemon\' means that Boinc runs in the background rather than via the Boinc manager (AKA the service install).

The reason the model crashed was this:


Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA

Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA

Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA

Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA
Fatal crash! :-(

It indicates one of two things.

A) Most likely, the mix of parameters in the original model leads to a physically impossible climate. The model is designed to shut itself down if this occurs. One of the main aims of the project is to work out which combinations of starting parameters are viable and which are not.

B) Less likely, the PC is generating dodgy floating point calculations due to something like overclocking or overheating. This is unlikely if you\'ve run other climate models on the same PC.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 27462 · Report as offensive     Reply Quote
Digby

Send message
Joined: 17 Feb 06
Posts: 89
Credit: 4,309,159
RAC: 0
Message 27464 - Posted: 23 Mar 2007, 15:07:22 UTC - in response to Message 27462.  
Last modified: 23 Mar 2007, 15:18:18 UTC

The reason the model crashed was this:

Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA
Fatal crash! :-(

It indicates one of two things.

A) Most likely, the mix of parameters in the original model leads to a physically impossible climate. The model is designed to shut itself down if this occurs. One of the main aims of the project is to work out which combinations of starting parameters are viable and which are not.

B) Less likely, the PC is generating dodgy floating point calculations due to something like overclocking or overheating. This is unlikely if you\'ve run other climate models on the same PC.


OK so is it worth restoring my backup which would re-start both models at approx 8.9% or shall I let my remaining model continue from 10.56% and just download a new one? If I restart at 8.9% it will be interesting to see if the dodgy model fails again at the same point...

One of the main aims of the project is to work out which combinations of starting parameters are viable and which are not.

So presumably the project managers analyse starting parameters with model failures...interesting.

Cheers
ID: 27464 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 27466 - Posted: 23 Mar 2007, 17:51:32 UTC - in response to Message 27464.  


OK so is it worth restoring my backup which would re-start both models at approx 8.9% or shall I let my remaining model continue from 10.56% and just download a new one? If I restart at 8.9% it will be interesting to see if the dodgy model fails again at the same point...


What I would suggest is running Prime 95\'s Torture Test for 24 hours or so on the PC. If it passes (it probably will), then there\'s no point in restoring from the backup because it will just hit the same thing again. If if fails, then I\'d suggest trying to resolve whatever problem it identifies, and then resuming from the backup.


So presumably the project managers analyse starting parameters with model failures...interesting.


Yes, this is a big focus of their work
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 27466 · Report as offensive     Reply Quote
Digby

Send message
Joined: 17 Feb 06
Posts: 89
Credit: 4,309,159
RAC: 0
Message 27475 - Posted: 24 Mar 2007, 11:15:54 UTC - in response to Message 27466.  

OK, I just resumed running again and a new model was downloaded at midnight after a six hour delay which the server insisted upon. So I\'m continuing the journey to \'infinity and beyond\' as Buz would say.

Thanks for your assistance.

Digby
ID: 27475 · Report as offensive     Reply Quote

Questions and Answers : Windows : I\'m back and I just had my first crash...shall I just restore a backup?

©2024 cpdn.org