climateprediction.net home page
Models Stopped

Models Stopped

Questions and Answers : Windows : Models Stopped
Message board moderation

To post messages, you must log in.

AuthorMessage
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 29099 - Posted: 1 Jun 2007, 12:32:27 UTC

Having been making backups since the start of my 2 models, I decided it was high time that I tested the restore process. Mistake!

To backup, I closed down BOINC/CCE with File > Exit. I then took a backup by copying all files from the c:\\BOINC folder into a backup location, for this exercise, elsewhere on c: although I normally backup to another PC on the network.

To restore, I then deleted all files and folders from c:/BOINC . I then copied everything back from the backup folder to c:/BOINC.

I then restarted BOINC/CCE. Everything appeared to be OK, except for one thing: the models were just sitting there and not executing, i.e. the CPU timers were not incrementing and, when I clicked \"Show Graphics\", the world has no cloud or temp gradiation.

What am I doing wrong and how do I recover?

/N
ID: 29099 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29100 - Posted: 1 Jun 2007, 13:09:42 UTC


A few questions for you:

Did you get any error messages while doing either copy?
What messages are displayed in the boinc manager?
Do the tasks show as \'running\' on the work/tasks tab?


I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29100 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 29102 - Posted: 1 Jun 2007, 14:21:07 UTC - in response to Message 29100.  


A few questions for you:

Did you get any error messages while doing either copy?
What messages are displayed in the boinc manager?
Do the tasks show as \'running\' on the work/tasks tab?



Absolutely no error messages. On my other (BBC) model, I do this twice a week as I transfer the model from desktop to laptop and vv, so felt myself to be familiar with the process. It looked procedurally identical to what\'s usual for me, except that it was back onto the same PC.

BOINC Manager messages:
01/06/2007 15:13:38||Starting BOINC client version 5.8.16 for windows_intelx86
01/06/2007 15:13:38||log flags: task, file_xfer, sched_ops
01/06/2007 15:13:38||Libraries: libcurl/7.16.0 OpenSSL/0.9.8a zlib/1.2.3
01/06/2007 15:13:38||Data directory: C:\\Program Files\\BOINC
01/06/2007 15:13:38||Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz [x86 Family 6 Model 15 Stepping 6] [fpu tsc pae nx sse sse2 mmx]
01/06/2007 15:13:38||Memory: 2.00 GB physical, 3.85 GB virtual
01/06/2007 15:13:38||Disk: 298.09 GB total, 243.17 GB free
01/06/2007 15:13:38|climateprediction.net|URL: http://climateprediction.net/; Computer ID: 668376; location: home; project prefs: default
01/06/2007 15:13:38||General prefs: from climateprediction.net (last modified 2007-05-19 15:01:48)
01/06/2007 15:13:38||Host location: home
01/06/2007 15:13:38||General prefs: no separate prefs for home; using your defaults
01/06/2007 15:13:38|climateprediction.net|Restarting task hadcm3ohe_0zos_05694590_0 using hadcm3 version 515
01/06/2007 15:13:38|climateprediction.net|Restarting task hadcm3ohe_2d8p_05758811_1 using hadcm3 version 515
01/06/2007 15:13:54|climateprediction.net|Resuming task hadcm3ohe_0zos_05694590_0 using hadcm3 version 515
01/06/2007 15:13:54|climateprediction.net|Resuming task hadcm3ohe_2d8p_05758811_1 using hadcm3 version 515
01/06/2007 15:14:03|climateprediction.net|Resuming task hadcm3ohe_0zos_05694590_0 using hadcm3 version 515

The Resumings are from when I did suspend/resume to attempt to jolt-start.

Tasks do both show as running.

Am wondering whether to try reinstalling BOINC onto the existing folder?

Thanks for help.
ID: 29102 · Report as offensive     Reply Quote
Profile Strathpeffer
Avatar

Send message
Joined: 9 Jan 07
Posts: 497
Credit: 342,899
RAC: 0
Message 29103 - Posted: 1 Jun 2007, 17:23:51 UTC

Lockleys, I\'ve occasionally had tasks showing as running but not actually running. Rebooting the computer has always got them going again.

Best regards, MM
Visit the Scotland team
ID: 29103 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 29104 - Posted: 1 Jun 2007, 17:37:42 UTC

SP: Tried that. On reboot and opening BOINC, my CPDN Tasks have disappeared entirely from the task list. Ouch!
ID: 29104 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 29105 - Posted: 1 Jun 2007, 17:46:32 UTC

More: I reprimed from backup, then rebooted. This time, the tasks didn\'t disappear, but they still didn\'t start. Once again, no sinister BOINC messages, just no action.
ID: 29105 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 29106 - Posted: 1 Jun 2007, 18:47:44 UTC
Last modified: 1 Jun 2007, 18:49:40 UTC

I have come back to this one hour later and found the following message string:

01/06/2007 18:42:20||Starting BOINC client version 5.8.16 for windows_intelx86
01/06/2007 18:42:20||log flags: task, file_xfer, sched_ops
01/06/2007 18:42:20||Libraries: libcurl/7.16.0 OpenSSL/0.9.8a zlib/1.2.3
01/06/2007 18:42:20||Data directory: C:\\Program Files\\BOINC
01/06/2007 18:42:20||Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz [x86 Family 6 Model 15 Stepping 6] [fpu tsc pae nx sse sse2 mmx]
01/06/2007 18:42:20||Memory: 2.00 GB physical, 3.85 GB virtual
01/06/2007 18:42:20||Disk: 298.09 GB total, 243.16 GB free
01/06/2007 18:42:20|climateprediction.net|URL: http://climateprediction.net/; Computer ID: 668376; location: home; project prefs: default
01/06/2007 18:42:20||General prefs: from climateprediction.net (last modified 2007-05-19 15:01:48)
01/06/2007 18:42:20||Host location: home
01/06/2007 18:42:20||General prefs: no separate prefs for home; using your defaults
01/06/2007 18:42:20||Suspending network activity - user request
01/06/2007 18:42:20|climateprediction.net|Restarting task hadcm3ohe_0zos_05694590_0 using hadcm3 version 515
01/06/2007 18:42:21|climateprediction.net|Restarting task hadcm3ohe_2d8p_05758811_1 using hadcm3 version 515
01/06/2007 18:43:06||Resuming network activity
01/06/2007 18:43:06|climateprediction.net|Sending scheduler request: To send trickle-up message
01/06/2007 18:43:06|climateprediction.net|(not requesting new work or reporting completed tasks)
01/06/2007 18:43:07|climateprediction.net|[file_xfer] Started upload of file hadcm3ohe_0zos_05694590_0_7.zip
01/06/2007 18:43:09|climateprediction.net|[file_xfer] Finished upload of file hadcm3ohe_0zos_05694590_0_7.zip
01/06/2007 18:43:09|climateprediction.net|[file_xfer] Throughput 245 bytes/sec
01/06/2007 18:43:11|climateprediction.net|Scheduler RPC succeeded [server version 509]
01/06/2007 18:43:11|climateprediction.net|Generated new host CPID: f6cdb01a1c13c902045e7a10e5c2b151
01/06/2007 18:43:42||Suspending network activity - user request
01/06/2007 19:30:39|climateprediction.net|Computation for task hadcm3ohe_0zos_05694590_0 finished
01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_8.zip for task hadcm3ohe_0zos_05694590_0 absent
01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_9.zip for task hadcm3ohe_0zos_05694590_0 absent
01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_10.zip for task hadcm3ohe_0zos_05694590_0 absent
01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_11.zip for task hadcm3ohe_0zos_05694590_0 absent
01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_12.zip for task hadcm3ohe_0zos_05694590_0 absent
01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_13.zip for task hadcm3ohe_0zos_05694590_0 absent
01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_14.zip for task hadcm3ohe_0zos_05694590_0 absent
01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_15.zip for task hadcm3ohe_0zos_05694590_0 absent
01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_16.zip for task hadcm3ohe_0zos_05694590_0 absent
01/06/2007 19:30:40|climateprediction.net|Deferring communication for 1 min 0 sec
01/06/2007 19:30:40|climateprediction.net|Reason: Unrecoverable error for result hadcm3ohe_0zos_05694590_0 (<file_xfer_error> <file_name>hadcm3ohe_0zos_05694590_0_8.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_0zos_05694590_0_9.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_0zos_05694590_0_10.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_0zos_05694590_0_11.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_0zos_05694590_0_12.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_0zos_05694590_0_13.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_0zos_05694590_0_14.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_0zos_05694590_0_15.zip</file_
01/06/2007 19:30:40|climateprediction.net|Computation for task hadcm3ohe_2d8p_05758811_1 finished
01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_8.zip for task hadcm3ohe_2d8p_05758811_1 absent
01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_9.zip for task hadcm3ohe_2d8p_05758811_1 absent
01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_10.zip for task hadcm3ohe_2d8p_05758811_1 absent
01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_11.zip for task hadcm3ohe_2d8p_05758811_1 absent
01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_12.zip for task hadcm3ohe_2d8p_05758811_1 absent
01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_13.zip for task hadcm3ohe_2d8p_05758811_1 absent
01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_14.zip for task hadcm3ohe_2d8p_05758811_1 absent
01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_15.zip for task hadcm3ohe_2d8p_05758811_1 absent
01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_16.zip for task hadcm3ohe_2d8p_05758811_1 absent
01/06/2007 19:30:41|climateprediction.net|Deferring communication for 1 min 0 sec
01/06/2007 19:30:41|climateprediction.net|Reason: Unrecoverable error for result hadcm3ohe_2d8p_05758811_1 (<file_xfer_error> <file_name>hadcm3ohe_2d8p_05758811_1_8.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_2d8p_05758811_1_9.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_2d8p_05758811_1_10.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_2d8p_05758811_1_11.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_2d8p_05758811_1_12.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_2d8p_05758811_1_13.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_2d8p_05758811_1_14.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_2d8p_05758811_1_15.zip</file_

I am running with networking suspended, so I guess these errors will not have been communicated to the server.

I should also say that the backup from which I restored was taken just before a decadal upload. Although the tasks failed to resume after the restore, the decadal upload went up to the server before I turned networking off again.
ID: 29106 · Report as offensive     Reply Quote
Profile Strathpeffer
Avatar

Send message
Joined: 9 Jan 07
Posts: 497
Credit: 342,899
RAC: 0
Message 29107 - Posted: 1 Jun 2007, 20:13:24 UTC

This is all beyond me I\'m afraid, Lockleys - but I\'ve also posted about it in the Scottish forum, so maybe some of our team-mates will be along to help in a minute.

The messages all look normal up to the end of your 70-year zip file upload, but then I see it says \"Generated new host CPID\" - as if you needed yet another one of those! I wonder if that\'s part of the problem?
Visit the Scotland team
ID: 29107 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29108 - Posted: 1 Jun 2007, 20:40:51 UTC
Last modified: 1 Jun 2007, 20:49:08 UTC

\'Generated new host ID\' tends to happen after a backup is restored, it\'s not particularly significant.

I can\'t see any specific reason for the crash in the error log.

Try this:

Restore the backup, then immediately suspend CPU and suspend network activity. Quit boinc and then restart it. Resume CPU, leave network disabled, see if it stops at the same point or continues on.

It might be that the backup isn\'t right (for example, a file missing or corrupt), in which case there\'s not much that can be done about it. You could try comparing the files in the backup to your other system, to see if there is anything not present. Also make sure all files are set read/write rather than read/only.

If you see the task marked as \'running\', but not taking any CPU time in Task Manager, try leaving it for 20 minutes or so (there is a 15 minute timer).
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29108 · Report as offensive     Reply Quote
Profile Rory
Avatar

Send message
Joined: 16 Feb 06
Posts: 23
Credit: 3,515,174
RAC: 0
Message 29109 - Posted: 1 Jun 2007, 21:37:11 UTC
Last modified: 1 Jun 2007, 21:38:00 UTC

I also see no known, to me fail!

Keep on going for the next trickle, as I see no fatal errors. Upload and check again. Just no error as in 0x0? Not my forty but pulled a few back!
Oh! merge the hosts.

Rory
Leave a planet to those following!
ID: 29109 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 29110 - Posted: 1 Jun 2007, 21:58:05 UTC

Thanks all. Tried the MikeMars suggestion, but my tasks have not resumed. If no other ideas by tomorrow, I\'ll have to try going back to earlier backups (I still have 19 of them) and see if they all behave the same. :( Still, a day lost would be better than 2 models lost!
ID: 29110 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 29111 - Posted: 1 Jun 2007, 23:31:32 UTC
Last modified: 1 Jun 2007, 23:32:45 UTC

The CPDN models will not run, of course, if something else is running. My machine (including BOINC) has been repeatedly brought down in recent weeks by an anonymous system process running flat-out in the background: eventually mouse and keyboard control are lost and the power switch is the only option. There does appear to be a Microsoft Update problem with some of these symptoms doing the rounds at the moment, in which case I would suggest:

- run Microsoft Update (the first half of their fix is out - more to follow, I think)

- boot the machine, but don\'t start BOINC (if BOINC starts automatically, then suspend and exit)

- check whether any other task is consuming a significant percentage of the CPU time, using Task Manager (right mouse click on taskbar)

- if another task is running then let it run for two hours to see whether it finishes. (This worked on two of my machines, but on my crash-prone machine the task ran for over 12 hours without finishing.)

If nothing is running in the background but the CPDN models still won\'t run then ignore this post!
ID: 29111 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 29113 - Posted: 2 Jun 2007, 5:30:31 UTC - in response to Message 29111.  

The CPDN models will not run, of course, if something else is running. My machine (including BOINC) has been repeatedly brought down in recent weeks by an anonymous system process running flat-out in the background: eventually mouse and keyboard control are lost and the power switch is the only option. There does appear to be a Microsoft Update problem with some of these symptoms doing the rounds at the moment, in which case I would suggest:

- run Microsoft Update (the first half of their fix is out - more to follow, I think)

- boot the machine, but don\'t start BOINC (if BOINC starts automatically, then suspend and exit)

- check whether any other task is consuming a significant percentage of the CPU time, using Task Manager (right mouse click on taskbar)

- if another task is running then let it run for two hours to see whether it finishes. (This worked on two of my machines, but on my crash-prone machine the task ran for over 12 hours without finishing.)

If nothing is running in the background but the CPDN models still won\'t run then ignore this post!


Thanks Iain. I checked Windows Task manager, but there\'s nothing there punching the CPU above 1%.
ID: 29113 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 29114 - Posted: 2 Jun 2007, 5:42:45 UTC

I seem to have cracked the problem!?! I deleted all files from c:/BOINC. Then loaded back the earliest backup I still have (about 3 months old). When I clicked Resume, the tasks started. So I Suspended and Exited, then copied my recent backup over c:/BOINC without deleting the old files. Opened and Resumed and it started straight away.

Odd, or what? Perhaps this may help others. Or pehaps nobody will ever have this anomoly.

Thanks to all for assistance and ideas.
ID: 29114 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29116 - Posted: 2 Jun 2007, 8:56:47 UTC


It sounds from your description that a file was missing from the most recent backup, glad it\'s working now :-)

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29116 · Report as offensive     Reply Quote
Profile Strathpeffer
Avatar

Send message
Joined: 9 Jan 07
Posts: 497
Credit: 342,899
RAC: 0
Message 29119 - Posted: 2 Jun 2007, 19:41:16 UTC

Glad to hear you\'re operational again, Lockleys - well done and happy continued crunching!

Regards, MM @ the Pavilion
Visit the Scotland team
ID: 29119 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 29122 - Posted: 2 Jun 2007, 22:04:27 UTC
Last modified: 2 Jun 2007, 22:07:26 UTC

Thought I was operational OK again - certainly it looks like that from my end of the periscope, but when I look at my results on the server it suudenly looks alarming with all sort of errors listed. See http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6243274
and
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6170819
for my two models.

Should I be worrying?

/Cheers
ID: 29122 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 29123 - Posted: 2 Jun 2007, 22:20:18 UTC

You can worry if you like.
But it\'s not compulsary. :)

The messages that get uploaded to the server first stay there, as there\'s no mechanism for replacing them with latter messages. (This is just part of the way BOINC is designed for the many other DC projects.)

Just carry on crunching, and rely on what you see in the Manager on your computer.

ID: 29123 · Report as offensive     Reply Quote
Profile Strathpeffer
Avatar

Send message
Joined: 9 Jan 07
Posts: 497
Credit: 342,899
RAC: 0
Message 29159 - Posted: 4 Jun 2007, 17:45:11 UTC

Les Bayliss wrote:
You can worry if you like.
But it\'s not compulsary. :)

Les!

Please don\'t worry Lockleys!

Best regards, MM @ the Pavilion
Visit the Scotland team
ID: 29159 · Report as offensive     Reply Quote

Questions and Answers : Windows : Models Stopped

©2024 climateprediction.net