climateprediction.net home page
Any ideas why my model crashed twice at this point?

Any ideas why my model crashed twice at this point?

Questions and Answers : Windows : Any ideas why my model crashed twice at this point?
Message board moderation

To post messages, you must log in.

AuthorMessage
Digby

Send message
Joined: 17 Feb 06
Posts: 89
Credit: 4,309,159
RAC: 0
Message 29542 - Posted: 15 Jul 2007, 16:48:27 UTC

My model crashed last night, I restored it this morning and it crashed again with the same error. Meanwhile the second model keeps chugging away. Nothing changed on my hardware from the last model so I don\'t know what \'The device does not recognize the command. (0x16) - exit code 22\' means.
A scan of the helpdesk indicates that 22 is some error that Tolu is looking into...

I have not allowed my model to communicate this error with the server yet. Shall I enable communications and just get another model?

Digby

15/07/2007 16:16:41|climateprediction.net|Deferring communication for 1 min 0 sec
15/07/2007 16:16:41|climateprediction.net|Reason: Unrecoverable error for result hadcm3inct_cl3f_1920_160_05888096_3 (The device does not recognize the command. (0x16) - exit code 22 (0x16))
15/07/2007 16:16:41|climateprediction.net|Computation for task hadcm3inct_cl3f_1920_160_05888096_3 finished
15/07/2007 16:16:41|climateprediction.net|Output file hadcm3inct_cl3f_1920_160_05888096_3_12.zip for task hadcm3inct_cl3f_1920_160_05888096_3 absent
15/07/2007 16:16:41|climateprediction.net|Output file hadcm3inct_cl3f_1920_160_05888096_3_13.zip for task hadcm3inct_cl3f_1920_160_05888096_3 absent
15/07/2007 16:16:41|climateprediction.net|Output file hadcm3inct_cl3f_1920_160_05888096_3_14.zip for task hadcm3inct_cl3f_1920_160_05888096_3 absent
15/07/2007 16:16:41|climateprediction.net|Output file hadcm3inct_cl3f_1920_160_05888096_3_15.zip for task hadcm3inct_cl3f_1920_160_05888096_3 absent
15/07/2007 16:16:41|climateprediction.net|Output file hadcm3inct_cl3f_1920_160_05888096_3_16.zip for task hadcm3inct_cl3f_1920_160_05888096_3 absent
ID: 29542 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 29544 - Posted: 15 Jul 2007, 19:32:34 UTC


All this \'zip business\' tells us nothing; it\'s just there because BOINC is trying to upload the files, but they haven\'t been created yet. (Which is why they\'re all \"absent\".)

Any real error message will be in the files that get uploaded when you let them.

Error 22 was a problem that seems to have been found, but the \"fixed\" version is still in beta testing.

It\'s probably best to get a new one, perhaps a shorter slab model to wait out the testing.

ID: 29544 · Report as offensive     Reply Quote
Digby

Send message
Joined: 17 Feb 06
Posts: 89
Credit: 4,309,159
RAC: 0
Message 29559 - Posted: 16 Jul 2007, 8:29:00 UTC - in response to Message 29544.  

OK, so I enabled \'network activity\' and all that happened was a trickle from my other model. So then I enabled \'fetch new tasks\' and all that happened was (surprisingly) I got a new model - a short slab.

So I don\'t think the 22 error message was received by the server because I had not enabled network activity when the crash happened and when I did enable network activity nothing was uploaded.

You can see my messages below. I still have the backup if that is useful...

Digby

15/07/2007 10:35:46||Starting BOINC client version 5.8.15 for windows_intelx86
15/07/2007 10:35:46||log flags: task, file_xfer, sched_ops
15/07/2007 10:35:46||Libraries: libcurl/7.16.0 OpenSSL/0.9.8a zlib/1.2.3
15/07/2007 10:35:46||Executing as a daemon
15/07/2007 10:35:46||Data directory: C:\\Program Files\\BOINC
15/07/2007 10:35:46||BOINC is running as a service and as a non-system user.
15/07/2007 10:35:46||No application graphics will be available.
15/07/2007 10:35:46||Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz [x86 Family 6 Model 15 Stepping 6] [fpu tsc pae nx sse sse2 mmx]
15/07/2007 10:35:46||Memory: 2.00 GB physical, 3.85 GB virtual
15/07/2007 10:35:46||Disk: 298.08 GB total, 265.94 GB free
15/07/2007 10:35:46|climateprediction.net|URL: http://climateprediction.net/; Computer ID: 576104; location: (none); project prefs: default
15/07/2007 10:35:46||No general preferences found - using BOINC defaults
15/07/2007 10:35:47||Suspending network activity - user request
15/07/2007 10:35:47|climateprediction.net|Restarting task hadcm3inct_cl3f_1920_160_05888096_3 using hadcm3i version 540
15/07/2007 10:35:47|climateprediction.net|Restarting task hadcm3inct_cnd9_1920_160_65870489_3 using hadcm3i version 542
15/07/2007 10:37:34||Resuming network activity
15/07/2007 10:37:53||Suspending network activity - user request
15/07/2007 13:18:12||Resuming network activity
15/07/2007 13:18:28||Suspending network activity - user request
15/07/2007 16:16:41|climateprediction.net|Deferring communication for 1 min 0 sec
15/07/2007 16:16:41|climateprediction.net|Reason: Unrecoverable error for result hadcm3inct_cl3f_1920_160_05888096_3 (The device does not recognize the command. (0x16) - exit code 22 (0x16))
15/07/2007 16:16:41|climateprediction.net|Computation for task hadcm3inct_cl3f_1920_160_05888096_3 finished
15/07/2007 16:16:41|climateprediction.net|Output file hadcm3inct_cl3f_1920_160_05888096_3_12.zip for task hadcm3inct_cl3f_1920_160_05888096_3 absent
15/07/2007 16:16:41|climateprediction.net|Output file hadcm3inct_cl3f_1920_160_05888096_3_13.zip for task hadcm3inct_cl3f_1920_160_05888096_3 absent
15/07/2007 16:16:41|climateprediction.net|Output file hadcm3inct_cl3f_1920_160_05888096_3_14.zip for task hadcm3inct_cl3f_1920_160_05888096_3 absent
15/07/2007 16:16:41|climateprediction.net|Output file hadcm3inct_cl3f_1920_160_05888096_3_15.zip for task hadcm3inct_cl3f_1920_160_05888096_3 absent
15/07/2007 16:16:41|climateprediction.net|Output file hadcm3inct_cl3f_1920_160_05888096_3_16.zip for task hadcm3inct_cl3f_1920_160_05888096_3 absent
16/07/2007 09:20:27||Resuming network activity
16/07/2007 09:20:28|climateprediction.net|Sending scheduler request: To send trickle-up message
16/07/2007 09:20:28|climateprediction.net|Reporting 1 tasks
16/07/2007 09:20:33|climateprediction.net|Scheduler RPC succeeded [server version 509]
16/07/2007 09:20:48|climateprediction.net|Sending scheduler request: To fetch work
16/07/2007 09:20:48|climateprediction.net|Requesting 8640 seconds of new work
16/07/2007 09:20:53|climateprediction.net|Scheduler RPC succeeded [server version 509]
16/07/2007 09:20:55|climateprediction.net|[file_xfer] Started download of file hadsm3fub_e070_005895058.zip
16/07/2007 09:20:56|climateprediction.net|[file_xfer] Finished download of file hadsm3fub_e070_005895058.zip
16/07/2007 09:20:56|climateprediction.net|[file_xfer] Throughput 93102 bytes/sec
16/07/2007 09:20:57|climateprediction.net|Starting hadsm3fub_e070_005895058_8
16/07/2007 09:20:57|climateprediction.net|Starting task hadsm3fub_e070_005895058_8 using hadsm3 version 506
ID: 29559 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1058
Credit: 36,494,375
RAC: 13,823
Message 29561 - Posted: 16 Jul 2007, 10:47:58 UTC - in response to Message 29559.  

OK, so I enabled \'network activity\' and all that happened was a trickle from my other model. So then I enabled \'fetch new tasks\' and all that happened was (surprisingly) I got a new model - a short slab.

So I don\'t think the 22 error message was received by the server because I had not enabled network activity when the crash happened and when I did enable network activity nothing was uploaded.

You can see my messages below. I still have the backup if that is useful...

Digby

On the contrary, when you re-enabled network activity your computer contacted the server and reported the error for result 6509459. If you click on that link, you\'ll see all the additional information that Les was waiting for, and (hopefully!) will now be able to interpret.
ID: 29561 · Report as offensive     Reply Quote
Digby

Send message
Joined: 17 Feb 06
Posts: 89
Credit: 4,309,159
RAC: 0
Message 29564 - Posted: 16 Jul 2007, 13:43:07 UTC - in response to Message 29561.  

Yep you are right...a quick scan suggests that the model was missing some data in ocean UV fields...


Model crashed: umshell1.f: TRANSO2A: Missing data in ocean UV fields Sorry, too many model crashes! :-(

Well, the good news is that Boinc can report back these issues. I suppose that someone has to now glean the useful information so we can learn from this error.

Cheers

Digby
ID: 29564 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 29571 - Posted: 16 Jul 2007, 18:25:33 UTC
Last modified: 16 Jul 2007, 18:27:17 UTC

Digby,

We\'re Beta-testing what we hope is the fix; unfortunately, another issue cropped-up and we\'re waiting for a fix for that...

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 29571 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 29574 - Posted: 16 Jul 2007, 21:12:44 UTC


Yes, as you\'ve now seen, the REAL reason for the crash is now visible.
And as astroWX said, we\'re now testing a cure for the cure.
And waiting for a cure for another side effect.

In the meantime, perhaps you\'d like to run a shorter slab. You can pick and choose by going into your Climateprdiction Preferences on the server, and ticking one of the 2.

ID: 29574 · Report as offensive     Reply Quote
Digby

Send message
Joined: 17 Feb 06
Posts: 89
Credit: 4,309,159
RAC: 0
Message 29581 - Posted: 17 Jul 2007, 8:25:24 UTC - in response to Message 29574.  

No Problem. The system actually chose to download a short slab model which I am working on now.

Hope everything cures ok.

Should I default to just testing short slabs?
ID: 29581 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 29590 - Posted: 17 Jul 2007, 20:16:14 UTC

Your choice. However, we hope things are set right before you\'re ready for another Model -- no guarantees, though.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 29590 · Report as offensive     Reply Quote

Questions and Answers : Windows : Any ideas why my model crashed twice at this point?

©2024 cpdn.org