climateprediction.net home page
Model crashes

Model crashes

Message boards : Number crunching : Model crashes
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
John Eric Hopkinson

Send message
Joined: 27 Jan 05
Posts: 74
Credit: 1,047,809
RAC: 0
Message 33413 - Posted: 18 Apr 2008, 13:48:19 UTC

mo.v,Les, et al:

I think this may be the approriate venue for my comments.

In February my computer completed a model and I took that opportunity to upgrade to V.510.10.45.
Problems arose immediately, and I lost my original record of completions, a new number was assigned to my computer and task completions became very erratic.
I had recently installed Windows Live OneCare and was accusing Bill Gates of all sorts of interference with BOINC. But I have since determined that he is not to blame, as there were no conflicts in timing of OneCare activity and CPDN requirements.
The comments in this thread indicate that I am not alone in lack of comprehension of BOINC\'s operations and terminology. It is not intuitively clear in many respects, so that when error messages show up in the Manager, I am inclined to say \"OK so what now?\".
I am struggling to find the correct settings which would avoid premature completions or what may even be crashes, and have considered reverting to earlier versions of BOINC.
You may be able to recognize some errors in settings or procedures in the following message copied from Manager today.

Regards,
jehop
Copied:
4/10/2008 9:40:55 AM||Starting BOINC client version 5.10.45 for windows_intelx86
4/10/2008 9:40:55 AM||log flags: task, file_xfer, sched_ops
4/10/2008 9:40:55 AM||Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3
4/10/2008 9:40:55 AM||Data directory: C:\\Program Files\\BOINC
4/10/2008 9:40:55 AM||Processor: 1 GenuineIntel Intel(R) Celeron(R) CPU 2.40GHz [x86 Family 15 Model 2 Stepping 9]
4/10/2008 9:40:55 AM||Processor features: fpu tsc sse sse2 mmx
4/10/2008 9:40:55 AM||OS: Microsoft Windows XP: Home Edition, Service Pack 2, (05.01.2600.00)
4/10/2008 9:40:55 AM||Memory: 1014.42 MB physical, 2.38 GB virtual
4/10/2008 9:40:55 AM||Disk: 34.12 GB total, 14.72 GB free
4/10/2008 9:40:55 AM||Local time is UTC -3 hours
4/10/2008 9:40:56 AM|climateprediction.net|URL: http://climateprediction.net/; Computer ID: 757552; location: home; project prefs: default
4/10/2008 9:40:56 AM||General prefs: from climateprediction.net (last modified 15-Feb-2008 17:33:39)
4/10/2008 9:40:56 AM||Host location: home
4/10/2008 9:40:56 AM||General prefs: no separate prefs for home; using your defaults
4/10/2008 9:40:56 AM||Reading preferences override file
4/10/2008 9:40:56 AM||Preferences limit memory usage when active to 507.21MB
4/10/2008 9:40:56 AM||Preferences limit memory usage when idle to 912.98MB
4/10/2008 9:40:56 AM||Preferences limit disk usage to 9.31GB
4/10/2008 9:40:56 AM|climateprediction.net|Restarting task hadcm3istd_0ipu_1920_160_05939948_3 using hadcm3i version 544
4/11/2008 6:22:57 PM|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
4/11/2008 6:23:07 PM|climateprediction.net|Scheduler request succeeded: got 0 new tasks
4/12/2008 3:02:04 PM||Running CPU benchmarks
4/12/2008 3:02:04 PM||Suspending computation - running CPU benchmarks
4/12/2008 3:02:35 PM||Benchmark results:
4/12/2008 3:02:35 PM|| Number of CPUs: 1
4/12/2008 3:02:35 PM|| 1276 floating point MIPS (Whetstone) per CPU
4/12/2008 3:02:35 PM|| 2343 integer MIPS (Dhrystone) per CPU
4/12/2008 3:02:36 PM||Resuming computation
4/13/2008 7:15:58 AM|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
4/13/2008 7:16:16 AM|climateprediction.net|Scheduler request succeeded: got 0 new tasks
4/14/2008 9:13:48 PM|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
4/14/2008 9:13:58 PM|climateprediction.net|Scheduler request succeeded: got 0 new tasks
4/16/2008 9:17:59 AM|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
4/16/2008 9:18:09 AM|climateprediction.net|Scheduler request succeeded: got 0 new tasks
4/17/2008 3:02:35 PM||Running CPU benchmarks
4/17/2008 3:02:35 PM||Suspending computation - running CPU benchmarks
4/17/2008 3:03:07 PM||Benchmark results:
4/17/2008 3:03:07 PM|| Number of CPUs: 1
4/17/2008 3:03:07 PM|| 1282 floating point MIPS (Whetstone) per CPU
4/17/2008 3:03:07 PM|| 2275 integer MIPS (Dhrystone) per CPU
4/17/2008 3:03:08 PM||Resuming computation
4/17/2008 9:14:27 PM|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
4/17/2008 9:14:37 PM|climateprediction.net|Scheduler request succeeded: got 0 new tasks
4/17/2008 9:16:41 PM|climateprediction.net|Computation for task hadcm3istd_0ipu_1920_160_05939948_3 finished
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_1.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_2.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_3.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_4.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_5.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_6.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_7.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_8.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_9.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_10.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_11.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_12.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_13.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_14.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_15.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_16.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:17:43 PM|climateprediction.net|Sending scheduler request: To fetch work. Requesting 30240 seconds of work, reporting 1 completed tasks
4/17/2008 9:17:48 PM|climateprediction.net|Scheduler request succeeded: got 1 new tasks
4/17/2008 9:17:50 PM|climateprediction.net|Started download of hadsm3fub_jk1x_005944120.zip
4/17/2008 9:17:53 PM|climateprediction.net|Finished download of hadsm3fub_jk1x_005944120.zip
4/17/2008 9:17:54 PM|climateprediction.net|Starting hadsm3fub_jk1x_005944120_5
4/17/2008 9:17:54 PM|climateprediction.net|Starting task hadsm3fub_jk1x_005944120_5 using hadsm3 version 506
4/18/2008 9:07:58 AM|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
4/18/2008 9:08:12 AM|climateprediction.net|Scheduler request succeeded: got 0 new tasks
4/18/2008 9:08:12 AM|climateprediction.net|Message from server: Project encountered internal error: shared memory
4/18/2008 9:45:14 AM||General prefs: from climateprediction.net (last modified 15-Feb-2008 17:33:39)
4/18/2008 9:45:14 AM||Host location: home
4/18/2008 9:45:14 AM||General prefs: no separate prefs for home; using your defaults
4/18/2008 9:45:14 AM||Reading preferences override file
4/18/2008 9:45:14 AM||Preferences limit memory usage when active to 507.21MB
4/18/2008 9:45:14 AM||Preferences limit memory usage when idle to 912.98MB
4/18/2008 9:45:14 AM||Preferences limit disk usage to 9.31GB
4/18/2008 9:51:19 AM||General prefs: from climateprediction.net (last modified 15-Feb-2008 17:33:39)
4/18/2008 9:51:19 AM||Host location: home
4/18/2008 9:51:19 AM||General prefs: no separate prefs for home; using your defaults
4/18/2008 9:51:19 AM||Reading preferences override file
4/18/2008 9:51:19 AM||Preferences limit memory usage when active to 507.21MB
4/18/2008 9:51:19 AM||Preferences limit memory usage when idle to 912.98MB
4/18/2008 9:51:19 AM||Preferences limit disk usage to 9.31GB
4/18/2008 10:08:13 AM|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
4/18/2008 10:08:28 AM|climateprediction.net|Scheduler request succeeded: got 0 new tasks

ID: 33413 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33418 - Posted: 18 Apr 2008, 18:09:22 UTC
Last modified: 18 Apr 2008, 19:10:47 UTC

I\'ve moved John Hopkinson\'t post from the Redundancy thread to here as it deserves full attention in its own right, though I fully understand why he put it there ie to illustrate how difficult some of the BOINC messages are to interpret.
Cpdn news
ID: 33418 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33420 - Posted: 18 Apr 2008, 18:46:40 UTC - in response to Message 33413.  
Last modified: 18 Apr 2008, 19:07:10 UTC

4/10/2008 9:40:55 AM||Starting BOINC client version 5.10.45 for windows_intelx86
4/10/2008 9:40:55 AM||log flags: task, file_xfer, sched_ops
4/10/2008 9:40:55 AM||Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3
4/10/2008 9:40:55 AM||Data directory: C:\\Program Files\\BOINC
4/10/2008 9:40:55 AM||Processor: 1 GenuineIntel Intel(R) Celeron(R) CPU 2.40GHz [x86 Family 15 Model 2 Stepping 9]
4/10/2008 9:40:55 AM||Processor features: fpu tsc sse sse2 mmx
4/10/2008 9:40:55 AM||OS: Microsoft Windows XP: Home Edition, Service Pack 2, (05.01.2600.00)
4/10/2008 9:40:55 AM||Memory: 1014.42 MB physical, 2.38 GB virtual
4/10/2008 9:40:55 AM||Disk: 34.12 GB total, 14.72 GB free
4/10/2008 9:40:55 AM||Local time is UTC -3 hours
4/10/2008 9:40:56 AM|climateprediction.net|URL: http://climateprediction.net/; Computer ID: 757552; location: home; project prefs: default
4/10/2008 9:40:56 AM||General prefs: from climateprediction.net (last modified 15-Feb-2008 17:33:39)
4/10/2008 9:40:56 AM||Host location: home
4/10/2008 9:40:56 AM||General prefs: no separate prefs for home; using your defaults
4/10/2008 9:40:56 AM||Reading preferences override file
4/10/2008 9:40:56 AM||Preferences limit memory usage when active to 507.21MB
4/10/2008 9:40:56 AM||Preferences limit memory usage when idle to 912.98MB
4/10/2008 9:40:56 AM||Preferences limit disk usage to 9.31GB
4/10/2008 9:40:56 AM|climateprediction.net|Restarting task hadcm3istd_0ipu_1920_160_05939948_3 using hadcm3i version 544
4/11/2008 6:22:57 PM|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
4/11/2008 6:23:07 PM|climateprediction.net|Scheduler request succeeded: got 0 new tasks
4/12/2008 3:02:04 PM||Running CPU benchmarks
4/12/2008 3:02:04 PM||Suspending computation - running CPU benchmarks
4/12/2008 3:02:35 PM||Benchmark results:
4/12/2008 3:02:35 PM|| Number of CPUs: 1
4/12/2008 3:02:35 PM|| 1276 floating point MIPS (Whetstone) per CPU
4/12/2008 3:02:35 PM|| 2343 integer MIPS (Dhrystone) per CPU
4/12/2008 3:02:36 PM||Resuming computation
4/13/2008 7:15:58 AM|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
4/13/2008 7:16:16 AM|climateprediction.net|Scheduler request succeeded: got 0 new tasks
4/14/2008 9:13:48 PM|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
4/14/2008 9:13:58 PM|climateprediction.net|Scheduler request succeeded: got 0 new tasks
4/16/2008 9:17:59 AM|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
4/16/2008 9:18:09 AM|climateprediction.net|Scheduler request succeeded: got 0 new tasks
4/17/2008 3:02:35 PM||Running CPU benchmarks
4/17/2008 3:02:35 PM||Suspending computation - running CPU benchmarks
4/17/2008 3:03:07 PM||Benchmark results:
4/17/2008 3:03:07 PM|| Number of CPUs: 1
4/17/2008 3:03:07 PM|| 1282 floating point MIPS (Whetstone) per CPU
4/17/2008 3:03:07 PM|| 2275 integer MIPS (Dhrystone) per CPU
4/17/2008 3:03:08 PM||Resuming computation
4/17/2008 9:14:27 PM|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
4/17/2008 9:14:37 PM|climateprediction.net|Scheduler request succeeded: got 0 new tasks


All normal messages so far. The fact that the new version of BOINC assigned a new computer ID is because this version describes the computer differently. BOINC can\'t merge 2 different descriptions. This is a nuisance but it does no harm.

4/17/2008 9:16:41 PM|climateprediction.net|Computation for task hadcm3istd_0ipu_1920_160_05939948_3 finished


The model has in fact crashed, but the messages don\'t say why. We need to look at the model\'s web page to find the reason. It\'s code 22 and \'The device cannot recognise the command\'.

4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_1.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_2.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_3.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_4.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_5.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_6.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_7.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_8.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_9.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_10.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_11.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_12.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_13.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_14.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_15.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent
4/17/2008 9:16:41 PM|climateprediction.net|Output file hadcm3istd_0ipu_1920_160_05939948_3_16.zip for task hadcm3istd_0ipu_1920_160_05939948_3 absent


The above messages say that a completed model would have sent all these extra zip files to the server. But they haven\'t been created and sent because the model didn\'t get far enough to make them. So these are not extra errors or problems.

4/17/2008 9:17:43 PM|climateprediction.net|Sending scheduler request: To fetch work. Requesting 30240 seconds of work, reporting 1 completed tasks


BOINC now reports the crashed model. Reporting is a separate process that happens after a model has finished. So this is normal behaviour.

4/17/2008 9:17:48 PM|climateprediction.net|Scheduler request succeeded: got 1 new tasks
4/17/2008 9:17:50 PM|climateprediction.net|Started download of hadsm3fub_jk1x_005944120.zip
4/17/2008 9:17:53 PM|climateprediction.net|Finished download of hadsm3fub_jk1x_005944120.zip
4/17/2008 9:17:54 PM|climateprediction.net|Starting hadsm3fub_jk1x_005944120_5
4/17/2008 9:17:54 PM|climateprediction.net|Starting task hadsm3fub_jk1x_005944120_5 using hadsm3 version 506


So BOINC asks the \'scheduler\' which means the CPDN server for a new model and gets one. It starts to crunch.

4/18/2008 9:07:58 AM|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
4/18/2008 9:08:12 AM|climateprediction.net|Scheduler request succeeded: got 0 new tasks


New model crunches and sends a trickle.

4/18/2008 9:08:12 AM|climateprediction.net|Message from server: Project encountered internal error: shared memory


Why shared memory should be a problem I don\'t know. You have 1 core & 1 model, I have 2 of each. My memory values are about double yours and I\'ve never had this message. Your computer has 1Gb RAM which ought to be more than adequate for one HADCM or HADSM model. Maybe some other program was running on your computer at that time and caused a memory conflict.

4/18/2008 9:45:14 AM||General prefs: from climateprediction.net (last modified 15-Feb-2008 17:33:39)
4/18/2008 9:45:14 AM||Host location: home
4/18/2008 9:45:14 AM||General prefs: no separate prefs for home; using your defaults
4/18/2008 9:45:14 AM||Reading preferences override file
4/18/2008 9:45:14 AM||Preferences limit memory usage when active to 507.21MB
4/18/2008 9:45:14 AM||Preferences limit memory usage when idle to 912.98MB
4/18/2008 9:45:14 AM||Preferences limit disk usage to 9.31GB
4/18/2008 9:51:19 AM||General prefs: from climateprediction.net (last modified 15-Feb-2008 17:33:39)
4/18/2008 9:51:19 AM||Host location: home
4/18/2008 9:51:19 AM||General prefs: no separate prefs for home; using your defaults
4/18/2008 9:51:19 AM||Reading preferences override file
4/18/2008 9:51:19 AM||Preferences limit memory usage when active to 507.21MB
4/18/2008 9:51:19 AM||Preferences limit memory usage when idle to 912.98MB
4/18/2008 9:51:19 AM||Preferences limit disk usage to 9.31GB
4/18/2008 10:08:13 AM|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
4/18/2008 10:08:28 AM|climateprediction.net|Scheduler request succeeded: got 0 new tasks

The last messages are routine ones.


So the problems are

* why the shared memory is not right (though this hasn\'t actually crashed your running model)

* why your last few models have crashed with 22 errors, the last model giving the detail about the device not recognising the command. I think that in this case the device in question means your own computer disk. (Another example of incomprehensible-to-most terminology.)

This is the model page with the details of the model crash:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=7369668

John, I hope that clarifies to some extent what the messages mean. We now need the help of someone who knows more about 22 errors and shared memory than me.

Cpdn news
ID: 33420 · Report as offensive     Reply Quote
old_user428438

Send message
Joined: 1 Feb 07
Posts: 26
Credit: 885,216
RAC: 0
Message 33421 - Posted: 18 Apr 2008, 19:30:59 UTC - in response to Message 33420.  


...
<snip>
4/18/2008 9:08:12 AM|climateprediction.net|Message from server: Project encountered internal error: shared memory


Why shared memory should be a problem I don\'t know.
</snip>
...

My reading of this is that the \"shared memory\" is at the server end, not the host. And it is similar to messages I have had from Oracle server when working with large databases.

But I may be totally wrong - that would not be unusual.

F.
ID: 33421 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 33425 - Posted: 18 Apr 2008, 20:48:59 UTC
Last modified: 18 Apr 2008, 20:50:26 UTC

I agree with Fred.
The shared memory message was sent by the server, so it\'s a server problem.

*********************************

Error 22: I-don\'t-know-what\'s-wrong-so-I\'ll-just-use-this-number.

*********************************

I\'ve noticed for several days now, that the forum pages are often slow to load, and if I try to look at a person\'s model page to see what went wrong, that it\'s often impossibly slow to access.
Also, for a couple of days, even the scheduler is unavailable when I try to upload trickles and I get error messages. The latest was 20 minutes ago.

So I think that either the project people are extracting large amounts of data again, or the server hard disks are (nearly) full again.
Either way, people will be getting error messages when they try to access the servers.

====================

Scheduler
1) Door man who directs the traffic flow to different devices.
2) Store manager, who accepts old work, and issues new work. (After all forms have been filled out in triplicate by hand, which slows down everyone else in the queue.)
And stop pushing back there, I\'m working as fast as I can. I haven\'t got 2 million hands, you know.


ID: 33425 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33426 - Posted: 18 Apr 2008, 20:54:26 UTC

Side comment, there is the \"Unofficial BOINC Wiki\" which at one point had a whole section where you could look up each and every message and get an explanation of what that message meant and if it was status, error, or something else ...

It has been two years since I worked on the UBW so I am not sure how many messages will not be there at all because they were added later ...

But, the UBW was at one point the \"encyclopedia\" of BOINC ... and though some say the content is dated, I am willing to bet that it is not as dated as they say ... the core of BOINC has not changed in the two years I was not attached ...

Anyway, just another place to look for information ...
ID: 33426 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33429 - Posted: 18 Apr 2008, 21:45:11 UTC
Last modified: 18 Apr 2008, 21:47:01 UTC

Well, thanks everybody for putting me right about the shared memory problem being on the server, not the home computer. But on rereading the BOINC message I can see why I misunderstood it. I will be asking the boys in Berkeley to reword the message to say something like \'Project server encountered internal server error...\' or \'Internal error on project server\' so crunchers don\'t think their own computer is at fault.

John, it would be a good idea for you to look at the README post about generic error codes like #22 to see whether there\'s anything you should be doing or not doing:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=4231
Cpdn news
ID: 33429 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33431 - Posted: 18 Apr 2008, 22:16:29 UTC

Paul, I haven\'t found \'The device does not recognize the command\' in the Unofficial Boinc Wiki, maybe because as far as I know the phrase never appears in the BOINC manager messages, only on the task\'s web page as an \'explanation\' (!!) of why it crashed.

http://www.boinc-wiki.info/Category:BOINC_Message

(Still an invaluable resource.)

Some of these messages rival ancient Taoist sayings. You might approach an approximate understanding if you retired to a mountain-top then meditated and lived on water and air until age 120.
Cpdn news
ID: 33431 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33432 - Posted: 18 Apr 2008, 23:15:06 UTC - in response to Message 33431.  

Paul, I haven\'t found \'The device does not recognize the command\' in the Unofficial Boinc Wiki, maybe because as far as I know the phrase never appears in the BOINC manager messages, only on the task\'s web page as an \'explanation\' (!!) of why it crashed.

http://www.boinc-wiki.info/Category:BOINC_Message

(Still an invaluable resource.)

Some of these messages rival ancient Taoist sayings. You might approach an approximate understanding if you retired to a mountain-top then meditated and lived on water and air until age 120.



Fading memory says that I never saw that message ...

If I never had it reported to me ... well, I could only research what and when someone, like you, reported a message ...

Sadly, I lost my mind and had to quit ...

More happily, though I have not found my mind again, or my axe (Nancy seems to hide it on me every time I find it again, and how can you be an insane axe killer without an axe is beyond me ... but I digress) ... I can at least get about pretty well most days ...

Anyway, you did get an answer and hopefully you are on your way again ...

Just one note, some systems seem to be resistant to running some of the projects and we cannot always figure out what the \"magic\" might be ... SO, it is possible that the system just will not run CPDN ...

For a time MY Mac Pro was eating models and it turns out that someone found an issue with the application and now I have 3 models to plow through ... another 900 hours for one ... :)

Anyway, good luck ...

You learned the first lesson of BOINC... ask when you don\'t know ... :)

Ignore the idiots that may show up to give you a hard time ...

For what it is worth, run through the forums and find the good guys and gals and if needed ask directly and most will try to point you right ...
ID: 33432 · Report as offensive     Reply Quote
John Eric Hopkinson

Send message
Joined: 27 Jan 05
Posts: 74
Credit: 1,047,809
RAC: 0
Message 33448 - Posted: 19 Apr 2008, 20:57:45 UTC - in response to Message 33429.  


mo.v re:

John, it would be a good idea for you to look at the README post about generic error codes like #22 to see whether there\'s anything you should be doing or not doing:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=4231 [/quote]

Will do that. I think I need to look at a lot more than error codes because I could never pass an exam on this stuff.

Thanks mo
ID: 33448 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33459 - Posted: 20 Apr 2008, 6:09:43 UTC - in response to Message 33448.  

Will do that. I think I need to look at a lot more than error codes because I could never pass an exam on this stuff.

What makes you think we could? :)
ID: 33459 · Report as offensive     Reply Quote
Profile old_user197041
Avatar

Send message
Joined: 27 Aug 06
Posts: 26
Credit: 162,685
RAC: 0
Message 33466 - Posted: 20 Apr 2008, 9:55:17 UTC - in response to Message 33459.  

Will do that. I think I need to look at a lot more than error codes because I could never pass an exam on this stuff.

What makes you think we could? :)


You taught me a lot Paul (even if you don\'t know it). But I don\'t think I could pass a test either :-)

I need Google and all the resources out there.
Kathryn :o)
The BOINC FAQ Service
The Unofficial BOINC Wiki
The Trac System
More BOINC information than you can shake a stick of RAM at.
ID: 33466 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33475 - Posted: 20 Apr 2008, 12:55:35 UTC - in response to Message 33466.  

You taught me a lot Paul (even if you don\'t know it).

Well, I know it NOW ... :)

My one professional passion is for teaching... I CAN teach, but I don\'t do well with the nonesense that comes along with it ...

But, for all of us it is a continual learning process ...

Yesterday and today I learned how to install Ubuntu and stand up a system for Linux (never really did if before with intent to use it) and I even have BOINC running there ...

Still trying to figure things out ... I lost my cable modem connection and Ubuntu then decided I did not have a network card and it was only when I moved the card from one slot to another that it would find it and connect to the network again ... not sure what is up with that ...

It is a REALLY old AMD system in which I stuck a 16G HD ... well, it works anyway ...

But I see Linux systems still under-claim on credit ...

Oh well ...
ID: 33475 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 33476 - Posted: 20 Apr 2008, 12:59:23 UTC


Well, they underclaim on benchmarks, but this is ignored at many projects, and at least at CPDN they get full credit for whatever work they do in terms of how many years they upload from the climate models.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 33476 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33480 - Posted: 20 Apr 2008, 17:28:19 UTC - in response to Message 33476.  


Well, they underclaim on benchmarks, but this is ignored at many projects, and at least at CPDN they get full credit for whatever work they do in terms of how many years they upload from the climate models.

Yeah, so I had to change the project selection around a bit ...

CPDN is a little long for this computer, especially in that I don\'t know that I will keep it running very long ...

BUt, it seems to do M-Way nicely along with a couple other projects ... so I will pick and choose a few to keep it busy and my scores high ...

After all, since most of the projects don\'t seem to be doing much with the work we do for them, it [bMUST[/b] be about the credits .... :)



ID: 33480 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33486 - Posted: 20 Apr 2008, 23:32:39 UTC

The CPDN scientists are doing plenty with the raw material we provide:

http://boinc.ssl.berkeley.edu/trac/wiki/ProjectPapers
Cpdn news
ID: 33486 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33505 - Posted: 21 Apr 2008, 16:15:15 UTC - in response to Message 33486.  

The CPDN scientists are doing plenty with the raw material we provide:

http://boinc.ssl.berkeley.edu/trac/wiki/ProjectPapers

Well,

a) the key word is most, there are 58+ projects that I have on my little list and 10 projects have published one paper or more ... that is 20%, most fits the bill ...
b) Some of those papers are over 5 years old, half of those that published, published only one paper ... Eah shows two, but it is a draft and final ...
c) I did not say CPDN was guilty in this respect.
d) notice where most of my effort has gone historically ... to CPDN ... which, as a project has done more ...

Note: SaH is high for historical reasons in that at one point they were the main project that had work all the time ... but you can also see on the stat sites where my ranking there is dropping like a stone ... I do one or two tasks a week for them and that is all ... almost all of my attention is elsewhere ...

WCG is about to get a whole lot more of my attention again as I turn to and get more of the minor projects at their targets ...

Sorry if you misconstrued my comment ...
ID: 33505 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33506 - Posted: 21 Apr 2008, 17:32:09 UTC

No, I knew you weren\'t criticising the CPDN scientists, no problem there.

On the other hand, the Berkeley people and others have given a lot of talks and published plenty of papers about distributed computing:

http://boinc.berkeley.edu/trac/wiki/BoincPapers
Cpdn news
ID: 33506 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33507 - Posted: 21 Apr 2008, 17:35:46 UTC

By the way, there\'s now an exam/test/quiz in the Cafe for anyone who wants to test their BOINC/CPDN skills.
Cpdn news
ID: 33507 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 33512 - Posted: 21 Apr 2008, 19:36:39 UTC - in response to Message 33507.  

By the way, there\'s now an exam/test/quiz in the Cafe for anyone who wants to test their BOINC/CPDN skills.

First we make BOINC hard and geeky ... now there are tests???


Just what will attract new users ... :)
ID: 33512 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Model crashes

©2024 climateprediction.net