Intel Visual Fortan run-time error

Author	Message
boiner_george Send message Joined: 29 Jan 12 Posts: 2 Credit: 608,583 RAC: 0	Message 45832 - Posted: 7 Apr 2013, 16:45:20 UTC Receiving run time error after increasing disk space for this hard disk eating hog ... added another 2gig. After doing this, apparently, it is the only change I've made to BOINC stuff in the last couple of weeks ... or for that matter to my machine ... other then loading the latest version of Java ... I get the following. forrtl: sever (19) invalid reference to variable in NAMELIST C:\ProgramData\BOINC\projects\climateprediction.net\hadcm3n_zg88_1920 ....\climate.cpdc line 528, position 8. .... stack trace terminate abnormally. Anybody out there got a clue? Running Pentium i7-2600K CPU 3.5GHz, with 16 gig RAM, NVIDIA 690 video card, Windows 7 64 bit Operating System ... tons of hard disk ID: 45832 · Reply Quote

astroWX Volunteer moderator Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0	Message 45834 - Posted: 7 Apr 2013, 18:58:40 UTC Last modified: 7 Apr 2013, 19:10:28 UTC I had a potload of them yesterday, on different machines. Each one threw six Fortran error popups, then crashed. No pattern was noticed in the Task names but, given that it was consistent across seven Intel quads from Q6600 to i5 3550, with OSs from XP_x64 to W7_x64, I chock it up to a problem with a large chunk of the few thousand Tasks released recently. All failed to start. Work units for those with a "history" showed the same problem. CPDN's Data file "growth" comes from the inability of CPDN to clean-up after itself after abnormal endings. Frustrating, isn't it? (I've been remiss in cleaning-up after failures for a long time and have Data files ranging up to a ridiculous 16Meg...) Edit: The link in my footer no longer works: It hasn't been updated because I have hope (probably vain) that our original board will be resurrected. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. ID: 45834 · Reply Quote

Arn Send message Joined: 28 Nov 07 Posts: 1 Credit: 592,614 RAC: 0	Message 45836 - Posted: 7 Apr 2013, 20:43:54 UTC I've been receiving the Intel Visual Fortran run-time error continuously for the second day now, but the error reads somewhat differently: forrtl: severe (19): invalid reference to variable in NAMELIST input, unit 5, file C:\ProgramData\BOINC\projects\climateprediction.net\hadcm3n_4cr;9_1980_40_008348863\jobs\climate.cpdc, line 529, position 0 Image PC Routine Line Source hadcm3n_um_6.07_w 007D9D2A Unknown Unknown Unknown hadcm3n_um_6.07_w 00780B60 Unknown Unknown Unknown hadcm3n_um_6.07_w 0077FD3A Unknown Unknown Unknown hadcm3n_um_6.07_w 007648D4 Unknown Unknown Unknown hadcm3n_um_6.07_w 0063744C Unknown Unknown Unknown hadcm3n_um_6.07_w 0054C606 Unknown Unknown Unknown hadcm3n_um_6.07_w 0054E1A9 Unknown Unknown Unknown hadcm3n_um_6.07_w 006FE53B Unknown Unknown Unknown hadcm3n_um_6.07_w 006F3667 Unknown Unknown Unknown hadcm3n_um_6.07_w 004083F3 Unknown Unknown Unknown hadcm3n_um_6.07_w 00408130 Unknown Unknown Unknown kernel32.dll 773DD2E9 Unknown Unknown Unknown ntdll.dll 77BB1603 Unknown Unknown Unknown ntdll.dll 77BB15D6 Unknown Unknown Unknown I have ended work for Climate Prediction until I am assured no damage will result from this error. I googled this and the very first stated 'severe' must be corrected. Any knowledgeable assistance will be appreciated. Thanks. tcpk22 ID: 45836 · Reply Quote

Lockleys Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0	Message 45837 - Posted: 7 Apr 2013, 21:13:06 UTC I have just experienced a similar message set to Arn for task hadcm3n_3l4z_1980_40_008349369_2 . I have aborted it. ID: 45837 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 45838 - Posted: 7 Apr 2013, 21:27:57 UTC Arn All "severe" means is that the error will most likely be fatal TO THE COMPUTER PROGRAM THAT HAS HAD THIS. i.e. the climate model. It doesn't mean that your computer will explode, or that your teeth will turn green and your hair fall out. ID: 45838 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 45839 - Posted: 7 Apr 2013, 21:30:11 UTC I've had a PM about this error, as well as those reported here, so I'll let the project people know. ID: 45839 · Reply Quote

Ironworker16 Send message Joined: 15 Jul 05 Posts: 1 Credit: 371,926 RAC: 0	Message 45840 - Posted: 7 Apr 2013, 23:02:43 UTC - in response to Message 45832. Last modified: 7 Apr 2013, 23:03:41 UTC I have the same error here also. I�m Including the Error text & stderr.txt from one work unit. I'm going to suspend the project unit until there is a resolution. --------------------------- Intel(r) Visual Fortran run-time error --------------------------- forrtl: severe (19): invalid reference to variable in NAMELIST input, unit 5, file C:\ProgramData\BOINC\projects\climateprediction.net\hadcm3n_4f8c_2020_40_008348911\jobs\climate.cpdc, line 529, position 0 Image PC Routine Line Source hadcm3n_um_6.07_w 007D9D2A Unknown Unknown Unknown hadcm3n_um_6.07_w 00780B60 Unknown Unknown Unknown hadcm3n_um_6.07_w 0077FD3A Unknown Unknown Unknown hadcm3n_um_6.07_w 007648D4 Unknown Unknown Unknown hadcm3n_um_6.07_w 0063744C Unknown Unknown Unknown hadcm3n_um_6.07_w 0054C606 Unknown Unknown Unknown hadcm3n_um_6.07_w 0054E1A9 Unknown Unknown Unknown hadcm3n_um_6.07_w 006FE53B Unknown Unknown Unknown hadcm3n_um_6.07_w 006FE53B Unknown Unknown Unknown hadcm3n_um_6.07_w 006F3667 Unknown Unknown Unknown hadcm3n_um_6.07_w 004083F3 Unknown Unknown Unknown hadcm3n_um_6.07_w 00733DBD Unknown Unknown Unknown ntdll.dll 772C04C0 Unknown Unknown Unknown ntdll.dll 772C0B1F Unknown Unknown Unknown ntdll.dll 772C0D5A Unknown Unknown Unknown ntdll.dll 772C0D5A Unknown Unknown Unknown ntdll.dll 772C2E92 Unknown Unknown Unknown ntdll.dll 772C2ED2 Unknown Unknown Unknown hadcm3n_um_6.07_w 007CCCEA Unknown Unknown Unknown ntdll.dll 772BF683 Unknown Unknown Unknown --------------------------- OK --------------------------- stderr.txt - Notepad 04:25:22 (76312): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=92788, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=92788, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=92788, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=92788, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=92788, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Running core i7-920 CPU , with 12 gig RAM, Radeon HD 7970 video card, Windows 8 64 bit Operating System ... tons of hard disk ID: 45840 · Reply Quote

mo.v Volunteer moderator Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0	Message 45841 - Posted: 8 Apr 2013, 1:15:55 UTC Last modified: 8 Apr 2013, 1:17:07 UTC Thanks to everyone for your reports. The reason the errors say Visual Fortran is that this is the language the climate models are written in. Here is a list of Fortran Run-Time error codes with very brief descriptions of their meanings. I had downloaded three new models yesterday, Sunday, but they hadn't begun to run. So I suspended some models already running to make the new ones start. Here's what happened: Within seconds of starting each of the three models threw a Visual Fortran Runtime error just like the ones members have already quoted. Two models starting in 1980 said the error was in line 529 in position 0, whereas the model starting in 1920 said line 528 in position 8. I left the models running and opened Windows Event Viewer to see whether the three runtime errors were recorded there. I could find no trace of these errors either by name or by timestamp. They appeared to have had no effect on the running of the computer. I then looked at the Fortran error page again and noticed that 'with severe, program execution stops (unless a recovery method is specified)'. My models still seemed to be running in the sense that they were still clocking up time. I opened the graphics window for each of them to see how they were advancing and found that all three were stopped at timestep No 1 and showed completely blue globes. Blue is the default colour and means that computation never started. I checked in Windows Task Manager Performance tab to see whether these models were using CPU time (and energy/electricity) and found that they were idle ie costing no energy. As these models are not advancing I'm going to abort them and get new ones. But if the new ones belong to the same batch they will probably throw the same error. Visual Fortran Runtime errors have never in the past done any harm to our computers. As Les has said, this error is restricted to the model in question. It looks scary because of the cross in the red circle but is harmless to everything except the models. Look at the graphics to see whether they're really processing and if they're not, please abort them. Cpdn news ID: 45841 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 45842 - Posted: 8 Apr 2013, 16:15:13 UTC OK, the problem has been traced to an incorrect line, (1 of hundreds), in one of the many files that go to make up data sets to start these models. This has been fixed, and the faulty data sets will be re-issued. Thank goodness people buy cars assembled, and don't get dozens of boxes of various shapes and sizes with parts that they then have to assemble themselves. With the instructions, no doubt, in the language of origin of the parts makers. :) ID: 45842 · Reply Quote

zombie67 [MM] Send message Joined: 2 Oct 06 Posts: 52 Credit: 26,209,214 RAC: 3,355	Message 45845 - Posted: 8 Apr 2013, 22:48:37 UTC I received several of these too. Will the bad tasks be aborted server-side? Reno, NV ID: 45845 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 45846 - Posted: 9 Apr 2013, 0:05:50 UTC - in response to Message 45845. Bad tasks on Macs and Linux should self abort very quickly. On Windows it may be a different matter. It's possible they may sit there pretending to run but not clocking up any progress in the various lines in the Show Graphics window. We're still talking about this. (Very slowly, due to time zone differences, and the loss of our php board.) My 2 are from December so they aren't affected, and I have to go by second hand information. ID: 45846 · Reply Quote

mo.v Volunteer moderator Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0	Message 45847 - Posted: 9 Apr 2013, 0:10:08 UTC Last modified: 9 Apr 2013, 0:11:30 UTC Hi Zombie To my knowledge, tasks already sent to computers won't be aborted from the server. This was done once before but the killer message was sent from the server to the computer when the model's next trickle was uploaded. But AFAIK this can't be done with the current models because although they're accumulating runtime they are making no progress and will never reach the end of their first year which is when they would normally trickle up and make contact with the server. I get the impression from looking at a lot of these models' task and WU web pages that on Darwin and Linux many of the models crash of their own accord. They don't all crash on Windows. On my own Windows machine three of these models accumulated runtime for well over an hour without making progress, using CPU time or crashing. Other longer periods have been reported in this thread. I think a lot of these models are still stuck on computers. Not using electricity but hogging CPU cores that could be crunching usefully. Please abort them. I know this is tedious for members who have a lot of computers. I see Les got there first but I'll leave my comments anyway Cpdn news ID: 45847 · Reply Quote

zombie67 [MM] Send message Joined: 2 Oct 06 Posts: 52 Credit: 26,209,214 RAC: 3,355	Message 45848 - Posted: 9 Apr 2013, 2:40:03 UTC Yes, I am talking about windows machines here. But the bad tasks should be aborted from the server-side, all the same. The machine will likely contact the server to fill a different thread slot, and would then learn to kill the task. There is no reason to not kill those bad tasks from the server side: If *nix: They die anyway if Win: They need to be killed anyway. Reno, NV ID: 45848 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 45849 - Posted: 9 Apr 2013, 3:07:19 UTC - in response to Message 45848. For the "killer trickle" to be sent to the correct target, that target, i.e. climate model, needs to return a trickle_up file for the server to find it. As has been said, this is unlikely to happen, so they CAN'T be killed from the server. As has also been said, we're still talking about this, but it'll be a few hours yet before the Oxford people are back at work to get the latest messages that have been sent to them. ID: 45849 · Reply Quote

JIM Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,072,437 RAC: 1,505	Message 45850 - Posted: 9 Apr 2013, 7:12:47 UTC - in response to Message 45842. Thank goodness people buy cars assembled, and don't get dozens of boxes of various shapes and sizes with parts that they then have to assemble themselves. With the instructions, no doubt, in the language of origin of the parts makers. Thank goodness people buy cars assembled, and don't get dozens of boxes of various shapes and sizes with parts that they then have to assemble themselves. With the instructions, no doubt, in the language of origin of the parts makers. Strangely, while you don�t buy cars that way you can buy airplanes. People buy disassembled kits that they have to put together themselves. Then they get in and fly them. Frightening isn�t it. ID: 45850 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4345 Credit: 16,518,727 RAC: 5,698	Message 45851 - Posted: 9 Apr 2013, 7:31:09 UTC - in response to Message 45850. Wouldn't know about the instructions bit - I only rtfm when something doesn't work. ID: 45851 · Reply Quote

Ingleside Send message Joined: 5 Aug 04 Posts: 108 Credit: 19,135,606 RAC: 33,793	Message 45916 - Posted: 13 Apr 2013, 1:07:15 UTC - in response to Message 45849. For the "killer trickle" to be sent to the correct target, that target, i.e. climate model, needs to return a trickle_up file for the server to find it. As has been said, this is unlikely to happen, so they CAN'T be killed from the server. Aborting tasks without relying on trickle-messages has been part of BOINC since around BOINC-Client v5.10.x. ID: 45916 · Reply Quote

MichaelO Send message Joined: 8 Aug 05 Posts: 12 Credit: 24,424,627 RAC: 0	Message 45949 - Posted: 16 Apr 2013, 20:22:35 UTC Great discussion...I was concerned I was doing something wrong. However, after aborting tasks behaving like those described, one machine I have has not received any further tasks. Is this likely an unrelated issue? I.e., could aborting the tasks with errors 'flag' my machine so the server now ignores it? ID: 45949 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 45950 - Posted: 16 Apr 2013, 20:49:46 UTC - in response to Message 45949. This project often has long periods of no work. This is one of them. There was a small batch of these models released to test the MD5 problem, but that may be it for a while. See the Server Status page for what's available. Blue menu to the left, 5 from the bottom. ID: 45950 · Reply Quote

Pete(r) van der Spoel Send message Joined: 5 Aug 04 Posts: 6 Credit: 7,002,751 RAC: 0	Message 45979 - Posted: 19 Apr 2013, 14:09:16 UTC - in response to Message 45842. OK, the problem has been traced to an incorrect line, (1 of hundreds), in one of the many files that go to make up data sets to start these models. This has been fixed, and the faulty data sets will be re-issued. Does this happen automatically or do I need to abort the tasks? I've been getting these errors since yesterday but the progress % keeps creeping up and the graphics confirm that the tasks still seem to be progressing (colour pattern changes). ID: 45979 · Reply Quote