Unrecoverable error for result sulphur_hska_000830170

Author	Message
old_user143821 Send message Joined: 26 Dec 05 Posts: 2 Credit: 251,588 RAC: 0	Message 20052 - Posted: 8 Feb 2006, 19:30:30 UTC Last modified: 8 Feb 2006, 19:32:32 UTC Hi, I\'ve got the next error after ~51% WU was done and climateprediction.net job has been lost from the Work tab of boinc manager. What can I do for solve this problem? cut from stderrdae.txt ====================================================== 2006-02-06 18:06:30 [climateprediction.net] Restarting result sulphur_hska_000830170_0 using sulphur_cycle version 422 2006-02-06 18:06:35 [climateprediction.net] Scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded 2006-02-06 18:07:08 [climateprediction.net] Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi 2006-02-06 18:07:08 [climateprediction.net] Reason: To send trickle-up message 2006-02-06 18:07:08 [climateprediction.net] Note: not requesting new work or reporting results 2006-02-06 18:07:36 [climateprediction.net] Scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded 2006-02-07 11:34:31 [---] request_reschedule_cpus: process exited 2006-02-07 11:34:31 [climateprediction.net] Computation for result sulphur_hska_000830170_0 finished 2006-02-07 11:34:32 [climateprediction.net] Unrecoverable error for result sulphur_hska_000830170_0 (<file_xfer_error> <file_name>sulphur_hska_000830170_0_3.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>sulphur_hska_000830170_0_4.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>sulphur_hska_000830170_0_5.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> ) ID: 20052 · Reply Quote

geophi Volunteer moderator Send message Joined: 7 Aug 04 Posts: 2184 Credit: 64,822,615 RAC: 5,275	Message 20053 - Posted: 8 Feb 2006, 19:47:22 UTC Unless there is a recent backup, there may not be anything you can do to recover this WU. Just as a diagnostic, could you post the last 30 lines of the yabsd.out file which may be found in the /projects/climateprediction.net/\"experimentname\" or /projects/climateprediction.net/\"experimentname\"/dataout folder. It may be zipped, but once unzipped, can be opened in WordPad. One other thing, using Windows 2003 on a PC with 256 MB of memory may be straining the system when BOINC climateprediction.net is running. I\'m not saying that was the cause of the failure, but may be contributing to it. ID: 20053 · Reply Quote

LMEE Send message Joined: 4 Sep 04 Posts: 7 Credit: 41,953,885 RAC: 296	Message 20061 - Posted: 9 Feb 2006, 17:23:42 UTC Same problem here. Recently a model calculation failed with an unrecoverable error. When I enabled network activity an upload of an 8 Mb file began. I use a dial-up connection and aborted the upload with the expectation of doing it later. The results seem to have been lost. Anyway I can recover the work for the project? Here are a series of message I received. 2/8/2006 8:41:21 PM\|climateprediction.net\|Note: not requesting new work or reporting results 2/8/2006 8:41:21 PM\|climateprediction.net\|Started upload of sulphur_in0v_100869647_0_1.zip 2/8/2006 8:41:28 PM\|climateprediction.net\|Scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded 2/8/2006 8:41:41 PM\|climateprediction.net\|Unrecoverable error for result sulphur_in0v_100869647_0 (<file_xfer_error> file_name>sulphur_in0v_100869647_0_1.zip</file_name> <error_code>-115</error_code> <error_message>user requested transfer abort</error_message></file_xfer_error>) Here are the last lines from the yabsd.out file. Are the results recoverable? SLAB TIMESTEP 177 3395537 words long MODEL DUMP SUCCESSFULLY WRITTEN - 3434914 WORDS TO UNIT 22 Number of Words Written to Disk was 3436498 im,sm,ngroup,new_im,new_sm 1 1 48 T F FINAL TOTAL ENERGY = 0.45466E+27 J/ INITIAL TOTAL ENERGY = 0.45455E+27 J/ CHG IN TOTAL ENERGY OVER DAY = 0.11824E+24 J/ FLUXES INTO ATM OVER DAY = 0.16368E+24 J/ ERROR IN ENERGY BUDGET = 0.45436E+23 J/ TEMP CORRECTION OVER DAY = 0.25144E-01 K TEMPERATURE CORRECTION RATE = 0.29102E-06 K/S FLUX CORRECTION (ATM) = 0.29441E+01 W/M2 FINAL ATM MASS = 0.17980E+22 KG INITIAL ATM MASS = 0.17980E+22 KG CORRECTION FACTOR FOR PSTAR = 0.10000E+01 im,sm,ngroup,new_im,new_sm 3 1 1 T F NOCNINDX Namelist is $NOCNINDX J_1 = 1 J_2 = 2 J_3 = 3 J_JMT = 73 J_JMTM1 = 72 J_JMTM2 = 71 J_JMTP1 = 74 JST = 1 JFIN = 73 J_FROM_LOC = 0 J_TO_LOC = 0 JMT_GLOBAL = 73 JMTM1_GLOBAL = 72 JMTM2_GLOBAL = 71 JMTP1_GLOBAL = 74 J_OFFSET = 0 O_MYPE = 0 O_EW_HALO = 0 O_NS_HALO = 0 J_PE_JSTM1 = -1 J_PE_JSTM2 = -1 J_PE_JFINP1 = -1 J_PE_JFINP2 = -1 O_NPROC = 1 IMOUT = 40 JMOUT = 40 J_PE_IND_MED = 4*0 NMEDLEV = 0 $END SLAB TIMESTEP 178 im,sm,ngroup,new_im,new_sm 1 1 48 T F Thanks, LMEE ID: 20061 · Reply Quote

DebT Send message Joined: 1 Dec 05 Posts: 1 Credit: 1,778,788 RAC: 564	Message 20077 - Posted: 10 Feb 2006, 0:38:03 UTC Same problem here. Also, since the error, no new work has downloaded. Error msg: 2006-02-05 08:49:36 [climateprediction.net] Unrecoverable error for result sulphur_e04r_000653355_0 (<file_xfer_error> <file_name>sulphur_e04r_000653355_0_2.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>sulphur_e04r_000653355_0_3.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>sulphur_e04r_000653355_0_4.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>sulphur_e04r_000653355_0_5.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> ) Last few lines of yabsd.out are: REPLANCA: UPDATE REQUIRED FOR FIELD 76 REPLANCA - time interpolation for field 76 time,time1,time2 7620.000 7200.000 7920.000 hours,int,period 7620 720 8640 Information used in checking ancillary data set: position of lookup table in dataset: 818 Position of first lookup table referring to data type 58 Interval between lookup tables referring to data type 76 Number of steps 10 STASH code in dataset 125 STASH code requested 125 \'Start\' position of lookup tables for dataset in overall lookup array 368 REPLANCA: UPDATE REQUIRED FOR FIELD 77 REPLANCA - time interpolation for field 77 time,time1,time2 7620.000 7200.000 7920.000 hours,int,period 7620 720 8640 Information used in checking ancillary data set: position of lookup table in dataset: 44 Position of first lookup table referring to data type 4 Interval between lookup tables referring to data type 4 Number of steps 10 STASH code in dataset 126 STASH code requested 126 \'Start\' position of lookup tables for dataset in overall lookup array 301 PPCTL: Opening new file e04rba.pa29c10 on unit 60 PPCTL: Initialising new file on unit 60 PPCTL: Opening new file e04rba.pb29c10 on unit 61 PPCTL: Initialising new file on unit 61 PPCTL: Opening new file e04rba.pd29c10 on unit 63 PPCTL: Initialising new file on unit 63 PPCTL: Opening new file e04rba.pe29c10 on unit 64 PPCTL: Initialising new file on unit 64 PPCTL: Opening new file e04rba.pf29c10 on unit 65 PPCTL: Initialising new file on unit 65 PPCTL: Opening new file e04rba.pg28dec on unit 66 PPCTL: Initialising new file on unit 66 PPCTL: Opening new file e04rba.ph28dec on unit 67 PPCTL: Initialising new file on unit 67 PPCTL: Opening new file e04rba.pi28dec on unit 68 PPCTL: Initialising new file on unit 68 ID: 20077 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 20082 - Posted: 10 Feb 2006, 5:31:44 UTC It looks like there is a new type of error starting to occur. If the two programmers weren\'t so tied up with last minute problems in getting the coupled model, (experiment 2), ready for launch, they\'d be right onto it. There\'s not much I can say, except hang in there. You could continue with sulphur, in case it works better next time. Deb At least you got phase one uploaded. In sulphur, this contains extra data not included in phase one of slab models, and this will help the researchers a lot. LMEE It looks as though you uploaded phase one from one computer as well. I\'m not sure if this was the one about which you posted. ID: 20082 · Reply Quote

tron Send message Joined: 1 Dec 05 Posts: 1 Credit: 2,905,140 RAC: 27,842	Message 20084 - Posted: 10 Feb 2006, 6:56:32 UTC Same problem here. Fourth model which failed with an unrecoverable error. :-( Here are a series of message I received. 10.02.2006 7:44:43 \|climateprediction.net\|Unrecoverable error for result sulphur_hc5e_100808898_0 (-exit code -1073741819(0xc0000005)) Last few lines of yabsd.out are: Number of Words Written to Disk was 3436498 im,sm,ngroup,new_im,new_sm 1 1 48 T F FINAL TOTAL ENERGY = 0.45364E+27 J/ INITIAL TOTAL ENERGY = 0.45363E+27 J/ CHG IN TOTAL ENERGY OVER DAY = 0.11511E+23 J/ FLUXES INTO ATM OVER DAY = 0.50202E+23 J/ ERROR IN ENERGY BUDGET = 0.38691E+23 J/ TEMP CORRECTION OVER DAY = 0.21412E-01 K TEMPERATURE CORRECTION RATE = 0.24782E-06 K/S FLUX CORRECTION (ATM) = 0.25071E+01 W/M2 FINAL ATM MASS = 0.17980E+22 KG INITIAL ATM MASS = 0.17980E+22 KG CORRECTION FACTOR FOR PSTAR = 0.99999E+00 im,sm,ngroup,new_im,new_sm 3 1 1 T F NOCNINDX Namelist is $NOCNINDX J_1 = 1 J_2 = 2 J_3 = 3 J_JMT = 73 J_JMTM1 = 72 J_JMTM2 = 71 J_JMTP1 = 74 JST = 1 JFIN = 73 J_FROM_LOC = 0 J_TO_LOC = 0 JMT_GLOBAL = 73 JMTM1_GLOBAL = 72 JMTM2_GLOBAL = 71 JMTP1_GLOBAL = 74 J_OFFSET = 0 O_MYPE = 0 O_EW_HALO = 0 O_NS_HALO = 0 J_PE_JSTM1 = -1 J_PE_JSTM2 = -1 J_PE_JFINP1 = -1 J_PE_JFINP2 = -1 O_NPROC = 1 IMOUT = 40 JMOUT = 40 J_PE_IND_MED = 40 NMEDLEV = 0 $END SLAB TIMESTEP 502 im,sm,ngroup,new_im,new_sm 1 1 48 T F FINAL TOTAL ENERGY = 0.45372E+27 J/ INITIAL TOTAL ENERGY = 0.45364E+27 J/ CHG IN TOTAL ENERGY OVER DAY = 0.79432E+23 J/ FLUXES INTO ATM OVER DAY = 0.12826E+24 J/ ERROR IN ENERGY BUDGET = 0.48826E+23 J/ TEMP CORRECTION OVER DAY = 0.27020E-01 K TEMPERATURE CORRECTION RATE = 0.31273E-06 K/S FLUX CORRECTION (ATM) = 0.31638E+01 W/M2 FINAL ATM MASS = 0.17980E+22 KG INITIAL ATM MASS = 0.17980E+22 KG CORRECTION FACTOR FOR PSTAR = 0.10000E+01 im,sm,ngroup,new_im,new_sm 3 1 1 T F NOCNINDX Namelist is $NOCNINDX J_1 = 1 J_2 = 2 J_3 = 3 J_JMT = 73 J_JMTM1 = 72 J_JMTM2 = 71 J_JMTP1 = 74 JST = 1 JFIN = 73 J_FROM_LOC = 0 J_TO_LOC = 0 JMT_GLOBAL = 73 JMTM1_GLOBAL = 72 JMTM2_GLOBAL = 71 JMTP1_GLOBAL = 74 J_OFFSET = 0 O_MYPE = 0 O_EW_HALO = 0 O_NS_HALO = 0 J_PE_JSTM1 = -1 J_PE_JSTM2 = -1 J_PE_JFINP1 = -1 J_PE_JFINP2 = -1 O_NPROC = 1 IMOUT = 40 JMOUT = 40 J_PE_IND_MED = 40 NMEDLEV = 0 $END SLAB TIMESTEP 503 im,sm,ngroup,new_im,new_sm 1 1 48 T F FINAL TOTAL ENERGY = 0.45381E+27 J/ INITIAL TOTAL ENERGY = 0.45372E+27 J/ CHG IN TOTAL ENERGY OVER DAY = 0.83084E+23 J/ FLUXES INTO ATM OVER DAY = 0.12463E+24 J/ ERROR IN ENERGY BUDGET = 0.41550E+23 J/ TEMP CORRECTION OVER DAY = 0.22994E-01 K TEMPERATURE CORRECTION RATE = 0.26613E-06 K/S FLUX CORRECTION (ATM) = 0.26923E+01 W/M2 FINAL ATM MASS = 0.17980E+22 KG INITIAL ATM MASS = 0.17980E+22 KG CORRECTION FACTOR FOR PSTAR = 0.99999E+00 im,sm,ngroup,new_im,new_sm 3 1 1 T F NOCNINDX Namelist is $NOCNINDX J_1 = 1 J_2 = 2 J_3 = 3 J_JMT = 73 J_JMTM1 = 72 J_JMTM2 = 71 J_JMTP1 = 74 JST = 1 JFIN = 73 J_FROM_LOC = 0 J_TO_LOC = 0 JMT_GLOBAL = 73 JMTM1_GLOBAL = 72 JMTM2_GLOBAL = 71 JMTP1_GLOBAL = 74 J_OFFSET = 0 O_MYPE = 0 O_EW_HALO = 0 O_NS_HALO = 0 J_PE_JSTM1 = -1 J_PE_JSTM2 = -1 J_PE_JFINP1 = -1 J_PE_JFINP2 = -1 O_NPROC = 1 IMOUT = 40 JMOUT = 40 J_PE_IND_MED = 4*0 NMEDLEV = 0 $END SLAB TIMESTEP 504 3395537 words long MODEL DUMP SUCCESSFULLY WRITTEN - 3434914 WORDS TO UNIT 22 Number of Words Written to Disk was 3436498 im,sm,ngroup,new_im,new_sm 1 1 48 T F Any idea what\'s going wrong??? Thx Himmelsjaeger ID: 20084 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 20087 - Posted: 10 Feb 2006, 8:28:52 UTC Ah. Now this is a different problem, tron. Error code -1073741819 appears to be a MicroSoft error, and is most likely to be a problem with your graphics card drivers. Try updating it/them, and see if that helps. If not, post back here again, and we\'ll have another look. ID: 20087 · Reply Quote

LMEE Send message Joined: 4 Sep 04 Posts: 7 Credit: 41,953,885 RAC: 296	Message 20118 - Posted: 10 Feb 2006, 23:02:45 UTC - in response to Message 20082. It looks like there is a new type of error starting to occur. If the two programmers weren\'t so tied up with last minute problems in getting the coupled model, (experiment 2), ready for launch, they\'d be right onto it. There\'s not much I can say, except hang in there. You could continue with sulphur, in case it works better next time. Deb At least you got phase one uploaded. In sulphur, this contains extra data not included in phase one of slab models, and this will help the researchers a lot. LMEE It looks as though you uploaded phase one from one computer as well. I\'m not sure if this was the one about which you posted. Les, The WU in question \"IN0V\" was running on computer ID 20344 when I lost the 8Mb upload file. Does this answer your question/ Is the large result file lost for good? Thanks, LMEE ID: 20118 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 20122 - Posted: 10 Feb 2006, 23:59:32 UTC It\'s there OK. <a href=\"http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=1730316\"> This</a> is a direct link to your model page. Click on P1 (phase 1) at the bottom to see the graphs. Keep in mind that this is \'data to give the user something for his effort\', not the data which the researchers use, which is more extensive. ID: 20122 · Reply Quote

old_user143821 Send message Joined: 26 Dec 05 Posts: 2 Credit: 251,588 RAC: 0	Message 20127 - Posted: 11 Feb 2006, 7:40:20 UTC - in response to Message 20053. Unless there is a recent backup, there may not be anything you can do to recover this WU. Just as a diagnostic, could you post the last 30 lines of the yabsd.out file which may be found in the /projects/climateprediction.net/\"experimentname\" or /projects/climateprediction.net/\"experimentname\"/dataout folder. It may be zipped, but once unzipped, can be opened in WordPad. One other thing, using Windows 2003 on a PC with 256 MB of memory may be straining the system when BOINC climateprediction.net is running. I\'m not saying that was the cause of the failure, but may be contributing to it. Unfortunately I don\'t have a backup and I haven\'t still think of it. It\'s a great pity a whole month of work was lost. hmm, it\'s actually AMD Athlon 64 3000+/256M under wXP 64 Pro, I\'ll add an extra memory module. REPLANCA: UPDATE REQUIRED FOR FIELD 76 REPLANCA - time interpolation for field 76 time,time1,time2 6900.000 6480.000 7200.000 hours,int,period 6900 720 8640 Information used in checking ancillary data set: position of lookup table in dataset: 742 Position of first lookup table referring to data type 58 Interval between lookup tables referring to data type 76 Number of steps 9 STASH code in dataset 125 STASH code requested 125 \'Start\' position of lookup tables for dataset in overall lookup array 332 REPLANCA: UPDATE REQUIRED FOR FIELD 77 REPLANCA - time interpolation for field 77 time,time1,time2 6900.000 6480.000 7200.000 hours,int,period 6900 720 8640 Information used in checking ancillary data set: position of lookup table in dataset: 40 Position of first lookup table referring to data type 4 Interval between lookup tables referring to data type 4 Number of steps 9 STASH code in dataset 126 STASH code requested 126 \'Start\' position of lookup tables for dataset in overall lookup array 265 im,sm,ngroup,new_im,new_sm 1 1 48 T F PPCTL: Opening new file hskaca.pg48nov on unit 66 PPCTL: Initialising new file on unit 66 PPCTL: Opening new file hskaca.ph48nov on unit 67 PPCTL: Initialising new file on unit 67 PPCTL: Opening new file hskaca.pi48nov on unit 68 PPCTL: Initialising new file on unit 68 NEGATIVE PRESSURE AT POINT 193 NEGATIVE PRESSURE AT POINT 194 ... skip from 195 to 478 ... NEGATIVE PRESSURE AT POINT 479 NEGATIVE PRESSURE AT POINT 480 ******************************************************************************* Model aborted with error code - 1 Routine and message:- P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. ******************************************************************************* ID: 20127 · Reply Quote

old_user48532 Send message Joined: 31 Jan 05 Posts: 1 Credit: 88,947 RAC: 0	Message 20362 - Posted: 16 Feb 2006, 21:21:21 UTC I also have this kind of problem, the same error, unrecoverable. Before i began using the ver. 5.2.13 it was better. In ver 4.45 the models didn\'t crash, but i couldn\'t run the grafic and had to disable the screensaver. I even had to change the virus scanner to use the ver 4.45, Antivir was causing the model to crash. Seemed to be a problem common with many Athlon systems. Earlier Version didn\'t make any problem at all, al other applications under Boinc ver 5.2.13 run smooth. I\'m sure there it\'s all down to a grafic driver problem, maybe ATI or XP... My system: Athlon XP 2600+ Ram 1024Mb Asus A7N8X-E Radeon 9600 with ATI-Drivers Windows XP Sp1 ID: 20362 · Reply Quote

Unrecoverable error for result sulphur_hska_000830170_0 ...