[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference kernel::csguk_systems

Title:CSGUK_SYSTEMS
Notice:No restrictions on keyword creation
Moderator:KERNEL::ADAMS
Created:Wed Mar 01 1989
Last Modified:Thu Nov 28 1996
Last Successful Update:Fri Jun 06 1997
Number of topics:242
Total number of notes:1855

194.0. "Take Control" by KERNEL::PETTET (Norm Pettet CSC Basingstoke) Fri Nov 11 1994 04:09

    Reserved for "Take Control" issues.
T.RTitleUserPersonal
Name
DateLines
194.1What does the panel think?KERNEL::PETTETNorm Pettet CSC BasingstokeFri Nov 11 1994 04:1124
Chaps,

	Whilst I believe our group IS in more control, than other groups in the 
CSC, there is still room for improvement. A little while ago allot of effort 
was put in by the CIG on standardising the update templates. Whilst I have
been on nights this week, looking through the deferred requests, it is quite
evident that some engineer's within our group are doing their "own thing"  and 
not following agreed quidelines. They are using their own "personalised" 
templates and NOT the ones Ken has built. Also the standard replies 
eg:sysbugchecks, differs between each of us. My own "standard replies" follow 
this reply as an example. If you look at other groups in the CSC it is quite 
evident that each group in the CSC are doing their "own thing" also. In that 
respect Jon Morris is correct we are out of control and inconsistent.

	I propose that a common area is setup on NICES containing standard
messages, templates etc that not only us, in Systems Diagnosis, but all groups
in the CSC can use. 

	What does the panel think? Brian (BRANT) can you discuss this with 
Paul?

	Regards,
		Norm
    
194.2Standard reply for SYSBUGSKERNEL::PETTETNorm Pettet CSC BasingstokeFri Nov 11 1994 04:1146
Dear Customer,

	I have received this VAXsim generated service request reporting
NON-fatal System bugchecks on node !@#$%&. These errors are normally caused by
a Software problem, and only VERY rarely is a hardware problem involved. They
can be isolated by using the following commands. 

    $ vaxsim/merge/entry=(40,37,2,112)/sin=-7-00:00:00/bin=x.x
    $ anal/error x.x
or

    $anal/err/include=bug/since="recent date"

	This will display a number of errorlog entries that will include those
reporting this problem. They will have " NON-FATAL BUGCHECK " in the entry. It
is normally possible to identify a common pattern in these errors. If you have
process accounting enabled, then the information stored there can be of use in
identifying what is causing this problem. It may be necessary to:- 

$ SET PROCESS/DUMP

and wait for the error to re-occur, then analyse the process dump to find out 
what is causing the problem. 

	If the bugchecks are caused by a product you have maintained by
Digital, then please let us know the details and we will pass this call on to
the appropriate support group. If not supported by Digital, then contact your
support group for assistance. 

	If the request has been caused by something else, or you would like
more help, we will be pleased to help in any way we can. Please contact the
Support Centre. 

		In the meantime, I will close this call.


		        Regards,

                        Norman Pettet
                        VAX Systems Diagnosis Engineer
                        DIGITAL
                        Customer Support Centre,
                        Basingstoke,
                        Hants

    
194.3Standard reply for CRD memory errorsKERNEL::PETTETNorm Pettet CSC BasingstokeFri Nov 11 1994 04:1226
Dear Customer,


A call has been automatically logged with the Customer Support Centre for
SYSTEM ......, NODE ........ Analysis shows that Memory Array #..,
has been logging CORRECTABLE READ DATA errors (CRD's) for Bit #...

The system is designed to expect occasional single bit errors, and the memory
controller will correct the data before allowing it to be used.

The recommended action is to reboot the system when this is convenient. Should
the same errors occur after the reboot i.e. CRD's with the same Array #/Bit #,
then further action may be required by DIGITAL on your behalf.

       Regards,

                        Norman Pettet
                        VAX Systems Diagnosis Engineer
                        DIGITAL
                        Customer Support Centre,
                        Basingstoke,
                        Hants

 
    
194.4Standard reply for operator requested shutdownsKERNEL::PETTETNorm Pettet CSC BasingstokeFri Nov 11 1994 04:1323
Dear Customer,

	VAXSIMPLUS has logged a request, with us at digital, for Operator
Requested Shutdowns on System......... Node....... at the following times :-




	These are not a problem and I will close this request.

		Thankyou for using our services,

       Regards,

                        Norman Pettet
                        VAX Systems Diagnosis Engineer
                        DIGITAL
                        Customer Support Centre,
                        Basingstoke,
                        Hants


    
194.5so, what does the team think?KERNEL::ANTHONYFri Nov 11 1994 19:589
    
    	I'm due to talk about this with Paul next monday..
    
    	in the meantime, has anyone got any views on this?
    
    	We need to decide if standard replys should always be used,
    	or if we can use our own modified ones..
    
    	Brian 
194.6A view form outside..KERNEL::LOANEComfortably numb!!Sat Nov 12 1994 00:369
    ....I've  always maintained that triggering a human response to some 
    Memory error is a waste of valuable resources. Wouldn't it  be  nice 
    if  some  pre-processing  entity spotted a Memory error being logged 
    by VAXsimPlus, logged the  call,  extracted  a  Stars  article  (aka 
    standard  template),  sent it as a response and then put the call in 
    a `hold state' (I don't know NICE;  excuse  any  assumptions  here). 
    The  same  coule  be  done  for  Operator  requested shutdowns, User 
    Bugchecks and even CLUE-logged calls that  get  a  hit  in  Canasta. 
    Would this reduce the workload?
194.7yes but beware..KERNEL::ANTHONYSat Nov 12 1994 01:1322
    
    ... yes we should be doing this. Isn't it strange for such a high-tech
    company as ours we are so slow to use out own technology. 
    
    word of warning however, most customers like the human touch.  They
    like to be able to 'talk' even if it's via an electronic link.
    
    If you look at other service industries, the emphasis is on the
    customer interface.  Ring up the Gas Board/electricity board etc you
    get a nice person saying 'Hi I'm Sarah how can I help you?..'
    you then make an appontment for them to fix your gas boiler and they
    don't turn up.. another story!  Another example is the turnaround at
    British Airways.. in the mid 70's the whole workforce went through
    customer care training, it took 3 years with follow on courses before
    the benefit was seen.. and now they make 350+M profit. 
    
    Bottom line is we should be using auotomated responses, but we also need
    to maintain focus on the customer, treating them as individuals who's
    business matters.  
    
    So how can we make an automated response appear to come from Sarah?
    :^)
194.8Sometimes "its good to talk" to customers KERNEL::PETTETNorm Pettet CSC BasingstokeSat Nov 12 1994 02:3121
    Chris,
    
    	I agree we should be able to make the technology work for us, but
    likewise, as Brian has stated, some customer's prefer the "personal"
    touch. There is perhaps no easy solution what would be acceptable for
    one customer is unacceptable for another customer.
    
    	As regards the NICE descriptions, and their use of, I would like to
    see ALL groups in the CSC follow the same convention as SYSTEMS have
    done so for the last year or so. We took control perhaps others groups 
    should follow our example. 
    
    	The template issue clearly needs further discussion with other
    groups, as Brian has stated he will be looking into this very soon.
    
    
    	What do other people think?
    
    Regards,
    
    		Norm
194.9So, where to from here??KERNEL::LOANEComfortably numb!!Sat Nov 12 1994 13:4110
    OK,  so we're violently agreeing!!! I actually started on a chunk of 
    code that would act as a pre-processor  to  SICL  Call  logging  and 
    spot  if  multiple  calls  were  being logged; then I got busy doing 
    something else!! It wouldn't take much to get it to a working  state 
    and  then  add the bit that spots whether it's a Memor error or User 
    Bugcheck (or whatever)......

    HOWEVER...we need to get some definition of how  this  is  going  to 
    fit  in  with  `the personal touch'. If someone wants to start this, 
    fire ahead; I'm listening.
194.10CSC automatic response - GREAT IDEAKERNEL::BLANDNorman Bland 833 3797 CSC, BasingstokeSat Nov 12 1994 14:2421
    re .6/.7/.8/.9
    
    A great idea, having an automatic response to minor issues that require
    NO remedial work. As long as the reply is worded in such a way as to
    not encourage customers to start asking questions; otherwise we will be
    back where we started. OK, it is not a perfect world, but as long as
    the majority are happy with the response they get back.
    
    As for the personal touch, I believe customers that have VAXSIM+ and
    AES get plenty of that; removing the personal touch associated with
    OPERATOR REQUESTED SHUTDOWNs and MEMORY CRD calls, will be no big deal.
    
    An issue (or is it); in the past week I have had two customers who said
    'we do not read DSNMAIL replies often' (or words to that affect). Yes,
    I know, you can lead a horse to water but it does not have to drink.
    We could argue, I assume, that if we have sent the appropriate response
    to these TRIVIAL calls, then that should be sufficient.
    
    OK CHRIS, when do we start using it :-)
    
    Norman B
194.11A re-think required on TEMPLATES ?KERNEL::BLANDNorman Bland 833 3797 CSC, BasingstokeSat Nov 12 1994 14:3923
    re .0
    
    Norman, I feel a CIG meeting coming on. Unless of course we can resolve
    this some other way.
    
    This week, someone who I will refer to as x said 'I use my own
    templates because the templates that Ken has produced do not give me
    what I require'.
    
    I think this clearly shows up the individualistic approach within the
    group. I am not saying that it is wrong but for the sake of
    consistency, we should use common templates (IMHO). When I said 'why
    not approach Ken about getting the templates modified', I cannot
    recall the exact response but I got the impression that it was too much
    trouble.
    
    I do not understand why something as simple as this good idea is,
    cannot be resolved such that eveyone is 90+% happy with the templates
    that are produced. I am not entirely happy with the templates myself
    but use them as this is what the Systems group agreed (or did they);
    and guidelines were written up by the CIG.
    
    Norman B
194.12lets take actionKERNEL::ANTHONYMon Nov 14 1994 23:0826
                                         
    	(I missed Paul today, maybe tomorrow..)
    
    	Looks like two problems here:
    	
    	1  the standard replies for things like sysbugs need updating.
    
    	2, the A, TS, P etc updates on calls need updating such that
    	   we are all happy, and WILL use them.
    
    	
    	Am I right in assuming that we agree,  we should ALL use the
    	SAME templates?  (for consistency and customer satisfaction)
    
    	I suggest we have a 'CIG'(*) meeting to finalize the templates\
    	updates on 1st December in the late afternoon (say 4-6pm) 
        and we invite Chris to input his thoughts on automating some
    	of the replies.
     
    	(*) all are welcome who can attend.  
    	(no-one is out apart from me on hol, and I'll be here!)
    	
    	between now and DEC 1st we discuss here via notes what the
    	problems are.
    
    	
194.13Don't like standardization for its own sake.COMICS::GLEDHILLTue Nov 15 1994 00:2252
Think we need to tighten up on how we update calls internally to improve
communication, reduce duplication on effort etc.
This is in note 175, need to check if we are following that and also if the
stuff in there is any use or can be improved. 

Dont think we really need to standardise the text we use to update the
customer, that will just make us seem less human and more mechanical. BUT
we need to establish consistent guidelines on the type of response we give.
Eg what versions we support, what is/isn't consultancy particulary in relation
to 3rd party products/hardware. 

Problem is the customer may or may not get support on certain things depending
on who happens to take the call. 

I went through a stage of trying to get purchase orders on some calls where
3rd party priv code was suspected (eg demax!) Partly to help us raise money
and also to avoid wasting time on crashes that may not be caused by our stuff. 
(As it happens it seemed more effort involved in trying avoid the work than
just getting on with it!) But if someone else had taken those calls they might
not have asked for a PO.

(eg  sysbug checks, what if the customer already has a call logged about his
sysbugchecks. if every time he has one he gets the same mail telling him some
commands to do he might get fed up!)

(Also in the update Norman put in earlier, don't think set process/dump works
for something in exec mode, pretty sure the process gets blown away straight
off. I would say that if it needs troubleshooting can force a crash with 
bugcheckfatal.

Also it is perfectly possible for a cpu/memory error to cause an accvio to
something that in exec mode -> sysbugcheck. However many sysbugchecks are 
inline ones generated by exec mode software (often rms). (There is an aes 
article that tells how to work out the type of rms bugcheck from the value
in r2.)

So I vote for less standardization of text at the customer end (for the sake of 
standardization) but if it can save us work??

DG.

PS
How much time would it save (to have CLs programs generating standard replies)
HOw much of chrises time would it tak to set up?

If you have this, what will it do with the call, will it close it or will it
need to still be looked at by someone and closed?

IF it closes it this might be dangerous.

If it has to be manually closed will it save much labour (the person taking the
call could put their own standard reply on if they want).
194.14PSCOMICS::GLEDHILLTue Nov 15 1994 00:3515
I just loked back again at chris's reply, says the call would be put in
a hold state.

Another think, if aes is loging calls on shutdown and we send replies saying
that it is only a shutdown and not to worry about it. Would it be better/possible
to tell vaxsim to not log these calls in the first place?

If you DID have softare to respond to sysbugcheck/memory errors could you 
teach it to somehow look at the history and see how many of them you have had.
If there were loads of them then flag it somehow?

(I could write you some datatrieve that did the equivalent of list history and
could get the problems/solution/lognos of previous aes calls. getting to any 
description would involve getting privilege to 
get to rdb or use the nice online interface (which is probably hard work)).
194.15....been there, done that!KERNEL::LOANEComfortably numb!!Tue Nov 15 1994 01:2813
>(eg  sysbug checks, what if the customer already has a call logged about his
>sysbugchecks. if every time he has one he gets the same mail telling him some
>commands to do he might get fed up!)

    When  I started this midnight hack, the above concern was my biggest 
    goal; it works!! Basically, I have a 24  hour  window  during  which 
    repeat  calls  (*MY* definition until otherwise over-ruled) only add 
    value and don't `log calls' (the actual logging of calls is  just  a 
    frig to the DSN Pre-processing as I recall).

    ....I  haven't  done  much  on  spotting the type of call, `cos it's 
    probably a LOT simpler than the logic of keeping track  of  multiple 
    occurences of the same symptom.
194.16Some action is needed - NOW !KERNEL::ADAMSBrian Adams CSC-Viables '833-3026Wed Nov 16 1994 13:2830
    
    I agree that the "canned replies" should be standardised and updated.
    
    The Sysbug one in particular should just state the reason and that no
    action is required from Digital, unless a supported product is
    involved. Let's stop asking customers to go thro' errorlogs, because
    those that bother to read the replies, probably think "Oh no, not that
    again". Let's also stop asking the customers to call us back to agree
    call closure. Most don't bother, and those that do, just say go ahead.
    
    Similarly with Memory CRD's lets just have one or two lines with the
    "event info" for comparison with any future calls, instead of all the
    theory around ECC memory etc.
    
    BUT.
    
    We have the technology to prevent such calls ever being logged with the
    CSC, and wouldn't it be nice if Vaxsim/SDD just mailed the customer
    with a standard message and stated that there was no need for a call to
    be logged.
    
    
    AND ALSO.
    
    We need some templates to send back to customers who misroute DSN calls
    by just defaulting to DSN%Hardware or DSN%ESR etc. I have a template 
    that gives them certain key addresses, pointing out the advantages to
    them in terms of service delivery.