climateprediction.net home page

The world's largest climate forecasting experiment for the 21st century.

Posts by Fivestar Crashtest

1) Message boards : Number crunching : Missing Tickles (Message 22613)
Posted 2906 days ago by Fivestar Crashtest
Now four trickles behind, sorry about the heading I just misread the info wrong and call them tickles instead of trickles.


It\'s nice to see I\'m not alone, I am about four trickles behind too.

And I thought calling them tickles was kind of cute.:)
2) Message boards : Number crunching : Unofficial BOINC Wiki closing 2006-03-31 (Message 20782)
Posted 2972 days ago by Fivestar Crashtest
Thanks for everything you have done, Paul. I got a lot of good out the Wiki and I\'ll miss seeing you around the forums.
3) Message boards : Number crunching : Windows Sulphur Cycle Reached Phase 2 but Linux Blew Up at the Gate (Message 16429)
Posted 3118 days ago by Fivestar Crashtest
I just looked in your account and found this:

snip
Not sure if I got the right computer or not ... but this tells me that you did not even get the model unpacked ...


Paul,

Yes, that was my crashed sulphur cycle.

As to the heartbeat issue- when I look in the linux system monitor, boinc is listed as sleeping, while the science app is listed as running. If a process is sleeping does it still have a heartbeat? Or because it is sleeping, it doesn\'t send a heartbeat message, and thus the generation of all those heartbeat messages.

Pam
4) Message boards : Number crunching : Windows Sulphur Cycle Reached Phase 2 but Linux Blew Up at the Gate (Message 16414)
Posted 3118 days ago by Fivestar Crashtest
In Linux, the log files are usually in the Wu folder( ~/BOINC/projects/climateprediction.net/xxxx_yyyyyyyyyy ) and in the slot folder (~/BOINC/slots/X)


Well, this is the xml file from the 494c folder in projects:

[UMID]
[V]100[/V]
[MD]SCYCLE[/MD]
[N]494c_b00298716[/N]
[PH]0[/PH]
[TS]1[/TS]
[DAY]0[/DAY]
[MTH]0[/MTH]
[YR]0[/YR]
[HR]0[/hr]
[MIN]0[/MIN]
[SEC]0[/SEC]
[CSF]0[/CSF]
[TR]0[/tr]
[ST]0[/ST]
[RS]3[/RS]
[RSC]1[/RSC]
[RSDT]0[/RSDT]
[RSMT]0[/RSMT]
[RSYT]0[/RSYT]
[RSD attr=\"0\"][/RSD]
[RSD attr=\"1\"][/RSD]
[RSD attr=\"2\"][/RSD]
[RSD attr=\"3\"][/RSD]
[RSD attr=\"4\"][/RSD]
[RSD attr=\"5\"][/RSD]
[RSD attr=\"6\"][/RSD]
[RSD attr=\"7\"][/RSD]
[RSD attr=\"8\"][/RSD]
[RSD attr=\"9\"][/RSD]
[RSD attr=\"10\"][/RSD]
[RSM attr=\"0\"][/RSM]
[RSM attr=\"1\"][/RSM]
[RSM attr=\"2\"][/RSM]
[RSM attr=\"3\"][/RSM]
[RSM attr=\"4\"][/RSM]
[RSM attr=\"5\"][/RSM]
[RSM attr=\"6\"][/RSM]
[RSM attr=\"7\"][/RSM]
[RSM attr=\"8\"][/RSM]
[RSM attr=\"9\"][/RSM]
[RSM attr=\"10\"][/RSM]
[RSY attr=\"0\"][/RSY]
[RSY attr=\"1\"][/RSY]
[RSY attr=\"2\"][/RSY]
[RSY attr=\"3\"][/RSY]
[RSY attr=\"4\"][/RSY]
[RSY attr=\"5\"][/RSY]
[RSY attr=\"6\"][/RSY]
[RSY attr=\"7\"][/RSY]
[RSY attr=\"8\"][/RSY]
[RSY attr=\"9\"][/RSY]
[RSY attr=\"10\"][/RSY]
[CS attr=\"0\"]../sulphur_um_4.21_i686-pc-linux-gnu=0d60790beb86b831463989c82ae15185[/CS]
[CS attr=\"1\"]jobs/climate.spin=1627afc00d4677ab01bcf34a5c90d48c[/CS]
[CS attr=\"2\"]jobs/climate.cont=b8fa65d29109ae10a4b33378116c1f12[/CS]
[CS attr=\"3\"]jobs/climate.doub=a39b44eec85dab48b7b806bb5576103f[/CS]
[CS attr=\"4\"]jobs/climate.so2.cont=a337bc8f664f4d7dab01cb476e658e99[/CS]
[CS attr=\"5\"]jobs/climate.so2.doub=0f628c3acc846f79678eaed577474d36[/CS]
[CS attr=\"6\"]jobs/ncatts.cpdc=bb68dbfdc5bcf5d5173f01b2857e31dd[/CS]
[/UMID]

And all the brackets were replaced with [].

and there is a stderr.txt file in slots that just says

\"no heartbeat from core client\" over and over again. Perhaps if I had collected the stderr.txt right after the model crash I could have had something useful.
5) Message boards : Number crunching : Windows Sulphur Cycle Reached Phase 2 but Linux Blew Up at the Gate (Message 16371)
Posted 3120 days ago by Fivestar Crashtest

Can you zip up the *.TXT and OLD files and send them to p.d.buck@comcast.net ...

Thanks!


I guess not, I am too dumb. Not sure what I am doing wrong getting Ark to zip things. I need to do some more reading.

I don\'t see any OLD files in Linux. I see them in the Windows BOINC folder. In Linux, there are txt files in the slots folders. I do have \"hidden files\" turned on. My poor little 494c folder in Projects only has an xml file in it. I put it in a word document and it is only three or four pages long.


6) Message boards : Number crunching : Windows Sulphur Cycle Reached Phase 2 but Linux Blew Up at the Gate (Message 16364)
Posted 3121 days ago by Fivestar Crashtest
Here are the messages. 2nqc is the slab that I had been running with 4.19, and 494c is the sulphur cycle. I have also noticed that there appear to be about three timesteps in a row missing from 2nqc, so I am not so sure 4.43 took up where 4.19 left off. I promise not to play with it anymore in the middle of a model without backing up.

2005-09-29 23:13:14 [---] Resuming computation and network activity
2005-09-29 23:13:14 [---] Computer is overcommitted
2005-09-29 23:13:14 [---] Nearly overcommitted.
2005-09-29 23:13:14 [---] New work fetch policy: no work fetch allowed.
2005-09-29 23:13:14 [---] New CPU scheduler policy: earliest deadline first.
2005-09-29 23:13:14 [---] schedule_cpus: must schedule
2005-09-29 23:13:14 [---] earliest deadline: 1139934000.000000 494c_b00298716_0
2005-09-29 23:13:14 [climateprediction.net] Starting result 494c_b00298716_0 using sulphur_cycle version 4.21
Starting model in /root/BOINC/projects/climateprediction.net...
Archive: sulphur_se_4.21_i686-pc-linux-gnu.zip
inflating: ./sulphur_se_4.21_i686-pc-linux-gnu
inflating: ./sulphur_gfx_4.21_i686-pc-linux-gnu
inflating: ./globe.rgb
extracting: ./gfx.sh
Archive: sulphur_um_4.21_i686-pc-linux-gnu.zip
inflating: ./sulphur_um_4.21_i686-pc-linux-gnu
Archive: sulphur_data_4.21_i686-pc-linux-gnu.zip
Archive: 494c_b00298716.zip
inflating: 494c_b00298716/jobs/climate.spin
inflating: 494c_b00298716/jobs/climate.cont
inflating: 494c_b00298716/jobs/climate.doub
inflating: 494c_b00298716/jobs/climate.so2.cont
inflating: 494c_b00298716/jobs/climate.so2.doub
inflating: 494c_b00298716/jobs/ncatts.cpdc
Created shared memory region key = 26600
.so shmem return code = 136446572
Copying files for startup...
In pre_initialise_phase (part 1 of 3)
In initialise_phase (part 2 of 3)
In startup_phase (part 3 of 3)
2005-09-29 23:13:17 [---] request_reschedule_cpus: process exited
2005-09-29 23:13:17 [climateprediction.net] Computation for result 494c_b00298716_0 finished
2005-09-29 23:13:17 [---] schedule_cpus: must schedule
2005-09-29 23:13:17 [---] New work fetch policy: work fetch allowed.
2005-09-29 23:13:17 [---] New CPU scheduler policy: highest debt first.
Starting model in /root/BOINC/projects/climateprediction.net...
Created shared memory region key = 25390
2005-09-29 23:13:17 [climateprediction.net] Restarting result 2nqc_000145322_0 using hadsm3 version 4.13
2005-09-29 23:13:17 [climateprediction.net] Unrecoverable error for result 494c_b00298716_0 (<file_xfer_error>
<file_name>494c_b00298716_0_1.zip</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>494c_b00298716_0_2.zip</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>494c_b00298716_0_3.zip</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>494c_b00298716_0_4.zip</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>494c_b00298716_0_5.zip</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>
)
2005-09-29 23:13:17 [climateprediction.net] Unrecoverable error for result 494c_b00298716_0 (<file_xfer_error>
<file_name>494c_b00298716_0_1.zip</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>494c_b00298716_0_2.zip</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>494c_b00298716_0_3.zip</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>494c_b00298716_0_4.zip</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>494c_b00298716_0_5.zip</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>
)
Env Used=LD_LIBRARY_PATH=/root/BOINC/projects/climateprediction.net:/usr/local/lib:/usr/lib:/lib
2005-09-29 23:13:18 [climateprediction.net] Deferring communication with project for 58 seconds
2005-09-29 23:13:18 [climateprediction.net] Deferring communication with project for 58 seconds
Starting model ID 2nqc_000145322 Phase 2
Stack size=48.00 MB
Waiting for model startup, this may take a minute...
2nqc_000145322 - PH 2 TS 169201 - 16/09/1835 00:30 - H:M:S=0268:09:45 AVG= 2.25 DLT= 0.00
7) Message boards : Number crunching : Windows Sulphur Cycle Reached Phase 2 but Linux Blew Up at the Gate (Message 16346)
Posted 3121 days ago by Fivestar Crashtest
I have a Sulphur on a P4 3.0GHz HT with Windows XP Home that has made it to Phase 2.

I had a Sulphur waiting on an AMD Athlon 64 2800+ running Linux 2.6.10, Boinc 4.19 (optimized by Ned Slider).

I was messing with the Linux machine and changed to Boinc 4.43. I have changed back and forth before with no ill effects. The slab wu was early in Phase 2. So when Boinc 4.43 woke up, it went to work on the wu with the earliest deadline, which was the sulphur. I kept the messages, in case they are of interest to anyone, but it quit with an error. Then it went back to crunching the slab, and it seems to be fine.

I stuck with optimized 4.19 because the benchmarks with 4.43 were so low. It turns out the timesteps with 4.43 are 2.2 sec compared to 2.4 sec. So at least for CPDN, optimized BOINC doesn\'t seem to help me.

No, I haven\'t been backing up. But I guess I need to start doing that. If I had, I could have suspended that sulphur cycle and tried running it under 4.19 when the time came.

I wanted to try 4.43 so I would have the ability to drain off the other projects wu\'s instead of trashing them by detaching, since 4.19 doesn\'t have as many features. I didn\'t think I would have got the sulphur done in time sharing with other projects.

Thanks for listening and Happy Crunching.
8) Message boards : Number crunching : What did sulfur model just upload? (Message 16196)
Posted 3128 days ago by Fivestar Crashtest
That is normal with sulphur and BOINC > 4.19. They wanted the intermediate results uploaded before the end of run. Continue on...


Can sulphur run with Boinc 4.19? Will it work and just not upload results until the end? I am using the Linux 4.19 client, and have a sulphur waiting while a slab finishes.
9) Message boards : Number crunching : Sulphur Download? (Message 16130)
Posted 3132 days ago by Fivestar Crashtest
are there still sulphur models beeing sent out?
my main pc will finish its first model in the next 24 hours and id really like to do a sulphur model. it downloaded another normal model today though, any chance i will get a sulphur if i suspend (or abort) that? preferences should be good for sulphur and the pc is fast enough too.

thx for any help :)


I got a sulphur the other day on a machine I didn\'t especially want one on. I like to run more than one project on a machine. Too bad we can\'t trade. :) The linux client 4.19 downloads one near the end of phase one. I will try to pay attention on the other linux machine when it gets near the end of phase one and adjust the disk space to get a slab.

It isn\'t going to hurt the running sulphur model when I lower the disk space for another machine, is it? Will there be any weird errors?

10) Message boards : Number crunching : New app? (Message 15520)
Posted 3154 days ago by Fivestar Crashtest
I got one this weekend and with less than two hours crunching, it is showing 2109 hours to go. That means it will take about three months to do, and the deadline is January 26! This one will be ok, since it is on a HT machine, but I worry about the single thread machines that are sharing with other projects. A three week unit takes six weeks sharing with one project, nine with three, assuming equal resources. So the sulphur cycle would take six months or nine months. Good thing the deadlines are not written in stone.

Pam


Next 10 posts



Copyright © 2002-2014 climateprediction.net