[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::ase

Title:ase
Moderator:SMURF::GROSSO
Created:Thu Jul 29 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2114
Total number of notes:7347

1971.0. "Hunting the elusive SCSI CAM offender..." by NETRIX::"mcdonald@decatl.alf.dec.com" (John McDonald) Thu Mar 27 1997 21:10

I was wondering if anyone could expand a bit on the various SCSI CAM
discussions that have been weaving their way through this conference.
I understand that the SCSI CAM - Unit reserved is caused by an ase
member on a shared SCSI bus trying to access a device that's reserved
by another machine. What I'm trying to figure out is if there's any
way to narrow the search on the offending machine to a specific process.

The reason I ask is that I have a customer who is seeing about 4000 of
these a day. I had him check for the advfsd and snmp daemons as a possible
source of the attempted accesses, but neither one was configured. he has
a fairly complex mix of processes and such on both machines in the ase
and wants to know how to narrow the problem down a little further.

Here's the hex output from one of the SCSI CAM uerf errors:

----- EVENT INFORMATION -----

EVENT CLASS                             ERROR EVENT
OS EVENT TYPE                  199.     CAM SCSI
SEQUENCE NUMBER              62544.
OPERATING SYSTEM                        DEC OSF/1
OCCURRED/LOGGED ON                      Tue Mar 18 11:44:22 1997
OCCURRED ON SYSTEM                      casper
SYSTEM ID                 x00060009     CPU TYPE:  DEC 2100
SYSTYPE                   x00000000

----- UNIT INFORMATION -----

CLASS                         x0000     DISK
SUBSYSTEM                     x0000     DISK
BUS #                         x0002
                              x0080     LUN x0
                                        TARGET x0

----- CAM STRING -----

ROUTINE NAME                            cdisk_op_spin

----- CAM STRING -----

                                        Unit Reserved

----- CAM STRING -----

ERROR TYPE                              Information Message Detected
                                         _(recovered)

----- CAM STRING -----

DEVICE NAME                             DEC     HSZ4

----- CAM STRING -----

                                        Active CCB at time of error

----- CAM STRING -----

                                        CCB request completed with an error

ERROR - os_std, os_type = 11, std_type = 10


----- ENT_CCB_SCSIIO -----

*MY ADDR                  x1FE30328
CCB LENGTH                    x00C0
FUNC CODE            x01
CAM_STATUS                    x0004     CAM_REQ_CMP_ERR
PATH ID              2.
TARGET ID            0.
TARGET LUN           0.
CAM FLAGS                 x000004C0
                                        CAM_DIR_NONE
                                        CAM_SIM_QFRZDIS
*PDRV_PTR                 x1FE30028
*NEXT_CCB                 x00000000
*REQ_MAP                  x00000000
VOID (*CAM_CBFCNP)()      x00478A50
*DATA_PTR                 x00000000
DXFER_LEN                 x00000000
*SENSE_PTR                x1FE30050
SENSE_LEN            xA0
CDB_LEN              x06
SGLIST_CNT                    x0000
CAM_SCSI_STATUS               x0018     SCSI_STAT_RESERVATION_CONFLICT
SENSE_RESID          x00
RESID                     x00000000
CAM_CDB_IO           x000000000000000000000000
CAM_TIMEOUT               x00000014
MSGB_LEN                      x0000
VU_FLAGS                      x0000
TAG_ACTION           x00

RECORD ENTRY DUMP:

  RECORD HEADER
0000:   F45002B0  00060009  00060501  332EC666        *..P.........f..3*
0010:   70736163  00007265  00000000  00000000        *casper..........*
0020:   00000001  00000000  000000C7  00800002        *................*
0030:   FFFFFFFF  00000000                            *........        *

  RECORD BODY
0038:   000000C7  00000000  00000000  00000000        *................*
0048:   00000007  00000000  A1216F80  FFFFFFFF        *.........o!.....*
0058:   00000005  00000000  00000102  0000000E        *................*
0068:   00000010  00000000  00581128  FFFFFC00        *........(.X.....*
0078:   00000001  00000000  73696463  706F5F6B        *........cdisk_op*
0088:   6970735F  0000006E  00000100  0000000E        *_spin...........*
0098:   00000010  00000000  00581138  FFFFFC00        *........8.X.....*
00A8:   00000001  00000000  74696E55  73655220        *........Unit Res*
00B8:   65767265  00000064  00000106  00000029        *erved.......)...*
00C8:   00000030  00000000  00582AF8  FFFFFC00        *0........*X.....*
00D8:   00000001  00000000  6F666E49  74616D72        *........Informat*
00E8:   206E6F69  7373654D  20656761  65746544        *ion Message Dete*
00F8:   64657463  65722820  65766F63  29646572        *cted (recovered)*
0108:   00000000  00000000  00000101  0000000D        *................*
0118:   00000010  00000000  00592780  FFFFFC00        *.........'Y.....*
0128:   00000001  00000000  20434544  20202020        *........DEC     *
0138:   345A5348  00000000  00000100  0000001C        *HSZ4............*
0148:   00000020  00000000  00582B78  FFFFFC00        * .......x+X.....*
0158:   00000001  00000000  69746341  43206576        *........Active C*
0168:   61204243  69742074  6F20656D  72652066        *CB at time of er*
0178:   00726F72  00000000  00000100  00000024        *ror.........$...*
0188:   00000028  00000000  005806A0  FFFFFC00        *(.........X.....*
0198:   00000001  00000000  20424343  75716572        *........CCB requ*
01A8:   20747365  706D6F63  6574656C  69772064        *est completed wi*
01B8:   61206874  7265206E  00726F72  00000000        *th an error.....*
01C8:   00000001  000000C0  000000C0  00000025        *............%...*
01D8:   1FE30328  FFFFFC00  00000002  00000000        *(...............*
01E8:   1FE30328  FFFFFC00  040100C0  00000200        *(...............*
01F8:   000004C0  00000000  1FE30028  FFFFFC00        *........(.......*
0208:   00000000  00000000  00000000  00000000        *................*
0218:   00478A50  FFFFFC00  00000000  00000000        *P.G.............*
0228:   00000000  00000000  1FE30050  FFFFFC00        *........P.......*
0238:   000006A0  00000000  00000018  00000000        *................*
0248:   00000000  00000000  00000000  00000000        *................*
0258:   00000014  00000000  00000000  00000000        *................*
0268:   00000000  00000000  1FE30018  00000000        *................*
0278:   DEC00DEC  00000000  00000000  00000000        *................*
0288:   00000000  00000000  7F8C0140  00A00001        *........@.......*
0298:   00000000  00000000  00000000  00000000        *................*
02A8:   00000000  5E3C7E25                            *....%~<^^        *

and here's the -o full output from one:

----- EVENT INFORMATION -----

EVENT CLASS                             ERROR EVENT
OS EVENT TYPE                  199.     CAM SCSI
SEQUENCE NUMBER              20862.
OPERATING SYSTEM                        DEC OSF/1
OCCURRED/LOGGED ON                      Thu Mar 27 14:44:37 1997
OCCURRED ON SYSTEM                      kramer
SYSTEM ID                 x00060009     CPU TYPE:  DEC 2100
SYSTYPE                   x00000000

----- UNIT INFORMATION -----

CLASS                         x0000     DISK
SUBSYSTEM                     x0000     DISK
BUS #                         x0002
                              x0080     LUN x0
                                        TARGET x0

----- CAM STRING -----

ROUTINE NAME                            cdisk_op_spin

----- CAM STRING -----

                                        Unit Reserved

----- CAM STRING -----

ERROR TYPE                              Information Message Detected
                                         _(recovered)

----- CAM STRING -----

DEVICE NAME                             DEC     HSZ4

----- CAM STRING -----

                                        Active CCB at time of error

----- CAM STRING -----

                                        CCB request completed with an error
ERROR - os_std, os_type = 11, std_type = 10


----- ENT_CCB_SCSIIO -----

*MY ADDR                  x1FE19728
CCB LENGTH                    x00C0
FUNC CODE            x01
CAM_STATUS                    x0004     CAM_REQ_CMP_ERR
PATH ID              2.
TARGET ID            0.
TARGET LUN           0.
CAM FLAGS                 x000004C0
                                        CAM_DIR_NONE
                                        CAM_SIM_QFRZDIS
*PDRV_PTR                 x1FE19428
*NEXT_CCB                 x00000000
*REQ_MAP                  x00000000
VOID (*CAM_CBFCNP)()      x00478AB0
*DATA_PTR                 x00000000
DXFER_LEN                 x00000000
*SENSE_PTR                x1FE19450
SENSE_LEN            xA0
CDB_LEN              x06
SGLIST_CNT                    x0000
CAM_SCSI_STATUS               x0018     SCSI_STAT_RESERVATION_CONFLICT
SENSE_RESID          x00
RESID                     x00000000
CAM_CDB_IO           x000000000000000000000000
CAM_TIMEOUT               x00000014
MSGB_LEN                      x0000
VU_FLAGS                      x0000
TAG_ACTION           x00

Is there any way for us mere mortals to look at this and be able to tell
a customer which process might be causing the problem?

thanx.

John McDonald
Atlanta CSC


[Posted by WWW Notes gateway]
T.RTitleUserPersonal
Name
DateLines
1971.1KITCHE::schottEric R. Schott USG Product ManagementSat Mar 29 1997 13:499
You might run trace on the machines on selected processes to see
if you catch who is doing it....I would guess if it is happening
regularly, it is some management application daemon...

you can find trace on:

http://www-unix.zk3.dec.com/tuning/tools/tools.html