Questions and Answers :
Windows :
HadAM3P-HadRM3P restart loop on Windows 7
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Mar 15 Posts: 3 Credit: 859,479 RAC: 0 |
The program will start, run for around 10 seconds and fail, restarting immediately and failing again in a never-ending loop. I didn't find any instructions for the preferred information gathering, but will be happy to collect information if it's desired. Windows 7 SP1, 8-core Intel i7, 8GB, BOINC 7.4.42 x64. HadAM3P 7.22. It failed on the first try and has never worked on this machine. The Coupled Model program seems to be proceeding normally, so I'll run that in the meantime. Thanks. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Hi, Maynard, Welcome to the project and to the boards. Checked your machine, found one task running and four aborted by user. Guaranteed: User aborts will kill tasks every time. How many times did each task crash/restart on its own? What does your 'Messages' tab show? "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 29 Mar 15 Posts: 3 Credit: 859,479 RAC: 0 |
Thanks for your quick reply. At first, I was assigned one task, which I allowed to restart for approximately 5-7 minutes before aborting. I estimate something like 30-50 restarts for that task. The system's memory-in-use display oscillated up and down with the same period, which is what made me notice in the first place. I hoped it was an isolated incident, but on the next batch, I received three more such jobs, which I saw behaving the same way and terminated much sooner, probably within one minute. The fourth job I received was hadcm3n_um_6.07_windows_intelx86 *32. It is running normally, but on a side note, the deadline calls for 400 hours of CPU over 92 days, which I'm not sure I can deliver. The shorter tasks had deadlines a year away and would easily make it. Nothing appears in the BOINC event log (with only the default logging enabled), nor did I find any log files in the project/task directories. If you have instructions for enabling better logging, I'll be happy to do it. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
1) As posted VERY regularly, there is NO "deadline" for returning the data. It's just an unbypassable BOINC requirement that there be one. As for the error messages, they appear under Stderr on each model's page. Click the plus sign to expand the list. |
Send message Joined: 29 Mar 15 Posts: 3 Credit: 859,479 RAC: 0 |
Thanks for pointing me to the error messages. Here is a sample: 18:41:05 (5248): BOINC client no longer exists - exiting 18:41:05 (5248): timer handler: client dead, exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:41:49 (10872): BOINC client no longer exists - exiting 18:41:49 (10872): timer handler: client dead, exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=9760, selfPID=9760, iMonCtr=2 18:42:00 (8224): BOINC client no longer exists - exiting 18:42:00 (8224): timer handler: client dead, exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:42:11 (6156): BOINC client no longer exists - exiting 18:42:11 (6156): timer handler: client dead, exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8332, selfPID=8332, iMonCtr=2 And so on. I saw a similar sequence in another thread, but the program identified in that case as the culprit is not installed on my machine. I'll assume the virus/firewall protection is a good place to start looking and will try some things when my current task nears completion. Thanks to both of you for your help. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Your problem has been identified, and, co-incidentally, also posted about by another cruncher. I have answered him here. |
©2024 climateprediction.net