[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

9526.0. "Blitz on new-wire-method V4.0*" by KITCHE::schott (Eric R. Schott USG Product Management) Wed Apr 16 1997 20:20

+---------------------------+TM
|   |   |   |   |   |   |   |
| d | i | g | i | t | a | l |      TIME   DEPENDENT   BLITZ
|   |   |   |   |   |   |   |
+---------------------------+


      BLITZ TITLE: DIGITAL UNIX DATA CORRUPTION WITH SHARED MEMORY 
	  	  

                                                DATE: 16 April 1997
      AUTHOR: John Donovan			TD #:
      DTN:    381-1344   
      ENET: guru::donovan                       CROSS REFERENCE #'s:
      DEPARTMENT: UNIX Support Engineering      (PRISM/TIME/CLD#'s)

      INTENDED AUDIENCE: All                    PRIORITY LEVEL: 1
      (U.S./EUROPE/GIA)                         (1=TIME CRITICAL,
                                                 2=NON-TIME CRITICAL)

=====================================================================

      INTRODUCTION: 

	During the course of prereleased hardware testing with Digital UNIX
	Versions 4.0 and later, the Digital UNIX Engineering Group discovered 
	a user application data corruption that was not detected by the
	operating system software.

      PROBLEM:  

	A data corruption problem can occur when the parameter new-wire-method
	is turned on. The new-wire-method parameter is only available in V4.0
	and later releases. All versions V4.0 and later ship with the default 
	being new-wire-method enabled.

      RESOLUTION/WORKAROUND:

	The workaround for this problem is as follows:

        The problem can be eliminated by turning off the new-wire-method.

	1) Become the root user.

 	2) Create a new file named /tmp/nwm and insert the following lines:

		  vm:
		     new-wire-method=0

 	3) Execute the sysconfigdb command as follows:

		  # /sbin/sysconfigdb -f /tmp/nwm -m vm

 	4) Reboot the system.

	The new-wire-method option is now disabled.

	Please note that turning off the new-wire-method should cause
	little or no performance degradation.

        It is the Strong Recommendation of Digital UNIX Engineering that

        this workaround be implemented on all systems running Digital UNIX
        V4.0 and above. Failure to do so can result in undetected data
        corruption.

      ADDITIONAL COMMENTS:

        Digital UNIX Engineering is working at the highest priority on a
        solution that will not require the above workaround. When the
        resultant fix is ready, an advisory blitz will announce its
        availability.

                     *** DIGITAL INTERNAL USE ONLY ***

\\ GRP=TIME_DEPENDENT CAT=HARDWARE DB=CSSE_TIME_CRITICAL
\\ TYPE=KNOWN_PROBLEM TYPE=BLITZ STATUS=CURRENT


T.RTitleUserPersonal
Name
DateLines
9526.1What are the circumstances??DYOSW5::WILDERDoes virtual reality get swapped?Thu Apr 17 1997 13:5811
    Any information on what circumstances can cause this data corruption?
    Any particular applications? The reason I ask is that I have a customer
    running 4.0a and TRC1.4. When taking down one node, it hung and they
    got some data corruption. Now, this could be caused by other factors,
    but it would be nice to see what engineering knows about this problem
    and how it can happen.
    
    Thanks,
    
    /jim
    
9526.2KITCHE::schottEric R. Schott USG Product ManagementThu Apr 17 1997 19:2015
>
>    Any information on what circumstances can cause this data corruption?
>    Any particular applications? The reason I ask is that I have a customer
>    running 4.0a and TRC1.4. When taking down one node, it hung and they
>    got some data corruption. Now, this could be caused by other factors,
>    but it would be nice to see what engineering knows about this problem
>    and how it can happen.
>    

This happens doing raw I/O...otherwise it is hard to describe
circumstances...

You should ensure you system, firmware, storage are uptodate for
patches to ensure you avoid possible corruptions.

9526.3Here is a little more...SMURF::KNIGHTFred KnightTue Apr 22 1997 17:4610
It requires both RAW I/O and swapping/paging.  If no
swapping or paging is happening, then no corruption will
occur.

The title is also inaccurate since it has nothing to do
with shared memory.  Any time you a do raw read (into any
memory, shared or not) and the process is swapped or paged,
the contents of the raw I/O buffer are at risk.

	Fred
9526.4LEXS01::GINGERRon GingerWed Apr 23 1997 12:535
    Thanks Fred for a simple answer. Why dont we do this in a blitz,
    instead of trying to hide the details of the problem. Then customers
    can make accurate assements of their risk and the urgency of taking
    this action. It is not always easy to appy changes- even a re-boot at
    my customer must be scheduled as much as 3 weeks in advance.