[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::ase

Title:ase
Moderator:SMURF::GROSSO
Created:Thu Jul 29 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2114
Total number of notes:7347

2078.0. "drd-data-compare=3 ???" by 22740::TERENCELEUNG () Wed May 21 1997 10:25

    Hello,
    
    We have a very similar situation of note 1991 in this conference.
    It's a dual 2100A DU4.0B, TCR 1.4 system, Oracle OPS, dual HSZ40
    with 6 RAID-5 configured RZ29's.
    
    Problem : When drd-data-compare=3, one system panics immediately when
    	      DRD service is accessed remotely by any cluster member.
    
              For easy reference, one system is called System A, the
    	      other is called System B. System B always panics whenever
    	      DRD service is accessed remotely by Oracle. For example, 
              svrmgrl > startup parallel
    	      svrmgrl > drop user user_a  (or any operation access DRD
    					   remotely)
    
    	      System B will panic immediately regardless of which system
    	      executes the above command, as long as the system executing
    	      the commands is accessing the DRD service remotely. That is,
    		if System A owns DRD service, B will panic if B accesses
    		the service remotely;
    		if System B owns the service, it will still B panic if A 
    		accesses the service remotely. 
    	      
    Actions taken so far :
    
    - The most updated 4.0B patch DUV40BAS00003-19970425 was applied.
    - New wire method was turned off by "new-wire-method=0".
    - "Simport" patch has been applied.
    - Swap all hardwares from System A (no panic) to System B(panic), which
      includes CPU, IO board, Memory module, memory channel, KZPSA, all
      other expansion cards in PCI slots, system disk(together with OS) and
      a local data disk.
    - Replace memory channel cable.
    - Turn off the two HSZ40 one at a time.      
    - By using the shell script given by 1991.8, dd two 8k files to DRD,
      read back and compare. There is no comparision error and both system
      do not panic. We have test this on both System A and B, local and
      remote access DRD.
    
    Question :
    
    1.	What is "drd-data-compare" ? What is its default value ? What is
    	the significance of setting it to 3 ? Before setting it to 3, there
    	is no panic, data corruption occurs once in one or two week. After
    	setting it to 3, System B panics immediately if one cluster member
    	accesses DRD remotely.
    
    2.	What is the difference, as far as UNIX and TCR is concerned,
    	between remote access a DRD service by "dd" and Orcale ?
    
    Thanks in advance,
    Terence
      
T.RTitleUserPersonal
Name
DateLines
2078.1drd-data-compare set equal on all hosts?NNTPD::"pelle@zk3.dec.com"PelleWed May 21 1997 18:0037
Be sure to set drd-data-compare to the same value on ALL the hosts. See
notes 2065.1
Here is an excerpt od the man page for drd(8):
  drd-data-compare
            When this attribute is set to 1, 2, or 3, the DRD subsystem
            performs a checksum of the data portion of read and write
            requests.  For proper operation, this attribute must be set to
            the same value on all cluster members.

            When this attribute is 0, no data check summing and comparisons
            are performed.

            When this attribute is 1, the bsc_stats.bsc_read_miscompares stat
            counter is incremented on DRD client read miscompares and the
            bss_stats.bss_write_miscompares stat counter is incremented on
            DRD server write miscompares.

            When this attribute is 2, the stat counters are incremented as
            appropriate and one of the following error messages is written to
            the console and kernel log files:
                 bsc_do_unmap_RM: READ check sum failure server = #  client =
#
                 bsc_rm_docopyinout: READ checksum failure server =  #  client
#
                 bss_rm_server: WRITE checksum failure client = # server = #

            When this attribute is 3, the stat counters are incremented as
            appropriate, the pertinent messages are written to the log files,
            and the system panics.

            All cluster members must use the same drd-data-compare value.
            Otherwise, some cluster members will not initialize the checksum
            value, causing other members to erroneously report that data
            corruption has occurred.  <tuning not supported>


[Posted by WWW Notes gateway]