Title: | DIGITAL UNIX (FORMERLY KNOWN AS DEC OSF/1) |
Notice: | Welcome to the Digital UNIX Conference |
Moderator: | SMURF::DENHAM |
Created: | Thu Mar 16 1995 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 10068 |
Total number of notes: | 35879 |
+---------------------------+TM | | | | | | | | | d | i | g | i | t | a | l | TIME DEPENDENT BLITZ | | | | | | | | +---------------------------+ BLITZ TITLE: DIGITAL UNIX DATA CORRUPTION WITH SHARED MEMORY DATE: 16 April 1997 AUTHOR: John Donovan TD #: DTN: 381-1344 ENET: guru::donovan CROSS REFERENCE #'s: DEPARTMENT: UNIX Support Engineering (PRISM/TIME/CLD#'s) INTENDED AUDIENCE: All PRIORITY LEVEL: 1 (U.S./EUROPE/GIA) (1=TIME CRITICAL, 2=NON-TIME CRITICAL) ===================================================================== INTRODUCTION: During the course of prereleased hardware testing with Digital UNIX Versions 4.0 and later, the Digital UNIX Engineering Group discovered a user application data corruption that was not detected by the operating system software. PROBLEM: A data corruption problem can occur when the parameter new-wire-method is turned on. The new-wire-method parameter is only available in V4.0 and later releases. All versions V4.0 and later ship with the default being new-wire-method enabled. RESOLUTION/WORKAROUND: The workaround for this problem is as follows: The problem can be eliminated by turning off the new-wire-method. 1) Become the root user. 2) Create a new file named /tmp/nwm and insert the following lines: vm: new-wire-method=0 3) Execute the sysconfigdb command as follows: # /sbin/sysconfigdb -f /tmp/nwm -m vm 4) Reboot the system. The new-wire-method option is now disabled. Please note that turning off the new-wire-method should cause little or no performance degradation. It is the Strong Recommendation of Digital UNIX Engineering that this workaround be implemented on all systems running Digital UNIX V4.0 and above. Failure to do so can result in undetected data corruption. ADDITIONAL COMMENTS: Digital UNIX Engineering is working at the highest priority on a solution that will not require the above workaround. When the resultant fix is ready, an advisory blitz will announce its availability. *** DIGITAL INTERNAL USE ONLY *** \\ GRP=TIME_DEPENDENT CAT=HARDWARE DB=CSSE_TIME_CRITICAL \\ TYPE=KNOWN_PROBLEM TYPE=BLITZ STATUS=CURRENT
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
9526.1 | What are the circumstances?? | DYOSW5::WILDER | Does virtual reality get swapped? | Thu Apr 17 1997 13:58 | 11 |
Any information on what circumstances can cause this data corruption? Any particular applications? The reason I ask is that I have a customer running 4.0a and TRC1.4. When taking down one node, it hung and they got some data corruption. Now, this could be caused by other factors, but it would be nice to see what engineering knows about this problem and how it can happen. Thanks, /jim | |||||
9526.2 | KITCHE::schott | Eric R. Schott USG Product Management | Thu Apr 17 1997 19:20 | 15 | |
> > Any information on what circumstances can cause this data corruption? > Any particular applications? The reason I ask is that I have a customer > running 4.0a and TRC1.4. When taking down one node, it hung and they > got some data corruption. Now, this could be caused by other factors, > but it would be nice to see what engineering knows about this problem > and how it can happen. > This happens doing raw I/O...otherwise it is hard to describe circumstances... You should ensure you system, firmware, storage are uptodate for patches to ensure you avoid possible corruptions. | |||||
9526.3 | Here is a little more... | SMURF::KNIGHT | Fred Knight | Tue Apr 22 1997 17:46 | 10 |
It requires both RAW I/O and swapping/paging. If no swapping or paging is happening, then no corruption will occur. The title is also inaccurate since it has nothing to do with shared memory. Any time you a do raw read (into any memory, shared or not) and the process is swapped or paged, the contents of the raw I/O buffer are at risk. Fred | |||||
9526.4 | LEXS01::GINGER | Ron Ginger | Wed Apr 23 1997 12:53 | 5 | |
Thanks Fred for a simple answer. Why dont we do this in a blitz, instead of trying to hide the details of the problem. Then customers can make accurate assements of their risk and the urgency of taking this action. It is not always easy to appy changes- even a re-boot at my customer must be scheduled as much as 3 weeks in advance. |