[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference csc32::consolemanager

Title:POLYCENTER Console Manager
Notice:Kits, Scans, Docs on CSC32:: as PCM$KITS:,PCM$DOCS:, PCM$SCANS:
Moderator:CSC32::BUTTERWORTH
Created:Thu Aug 06 1992
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1541
Total number of notes:6564

856.0. "CONSOLE CTRL 01 crashed two times in one day" by BACHUS::WILLEMSG (Geert Willems MCS-Belgium) Thu Jul 06 1995 21:07


	Hi,

	PCM V1.6(SSB kit), OpenVMS AXP V6.1.

	The customer had already two CONSOLE CTRL01 crashes in one day(today).
	Current 38 system are connected via PCM. 3 Console Ctrl xx processes.

	Action taken by the customer :
	1) PCM shutdown
	2) PCM startup
	   Result : Some system Icons become black and no connection is
		    possible anymore.

	   Action : Logout every port that is associated with a black icon
	            (system).
	3) PCM shutdown
	4) PCM startup -->OK

	Is this a know problem with PCM V1.6 ?
	By restarting PCM every information in the console$tmp was gone.
	I advised the customer to save this info next time.
	Is there anything that I need to check/trace ?

	Do I have to open an IPMT ?
	If yes, I would like to know what I must activate to give you
	the needed info. 
	This are all production system (AXPs,VAXs,PDP's) in a huge
	banking environment. So, we have to be very careful with our
	actions/proposals towards the customer.

	Any help/feedback is welcome.

	Thanks in advance.

	Rgds,

	Geert
T.RTitleUserPersonal
Name
DateLines
856.1CSC32::BUTTERWORTHGun Control is a steady hand.Thu Jul 06 1995 23:0210
    Geert,
      Assuming these crashes are access violations or some such, we would
    want process dumps of the controllers. To do this you are going to have
    to deinstall the images and modify the CONSOLE$STARTUP procedure so
    that it dones't reinstall CONSOLE$IMAGE:CONSOLE$DAEMON.EXE. Let it
    crash and make sure the customer recovers the dumps and logfiles from
    console$tmp.
    
    Regards,
       Dan
856.2BACHUS::WILLEMSGGeert Willems MCS-BelgiumMon Jul 10 1995 18:5818
    
    
    	Hi Dan,
    
    	First feedback from the customer.
    	He suspected a PDP that was connected to PCM.
    	So, he first disabled the PDP in his PCM database.
    	He included this PDP in his backup PCM database(other management
    	system, in fact this is a backup system for the other) and he also 
    	changed the device_type from LA210 to VT300. Until now everything
    	is still running. Can this be provoked by the device_type ?
    	I don't think, but ... !
    	We didn't deinstal the image until now.  But he will do if the
    	problem should re-happen.
    
    	Rgds,
    
    	Geert
856.3yes, it's an accvioBACHUS::WILLEMSGGeert Willems MCS-BelgiumTue Jul 11 1995 18:02195
	Hi Dan,Phil,Simon,


	The customer had again the CONSOLE CTRL xx crash.
	This time it was Console Ctrl 03.
	This was found in the CONTROLLER_03.LOG file :


type CONTROLLER_03.LOG;1
-----------------------
$!
$! This command procedure is always run when anybody on the entire system
$! logs in. It is equivalent to LOGIN.COM except that the instructions
$! contained herein are executed everytime anyone on the VMS system
$! logs in to their account.
$!
$! For interactive processes, turn on Control T, and set the terminal type
$!
$ mode = f$mode()
$ tt_devname = f$trnlnm("TT")
$ session_mgr_login = (mode .eqs. "INTERACTIVE") .and.  -
    (f$locate("WSA",tt_devname) .ne. f$len(tt_devname))
$ session_detached_process = (mode .eqs. "INTERACTIVE") .and. -
    (f$locate("MBA",tt_devname) .ne. f$len(tt_devname))
$ unknown_devtyp = (mode .eqs. "INTERACTIVE") .and. -
    (f$getdvi("sys$command","devtype") .eq. 0)
$!
$ if (mode .eqs. "INTERACTIVE") .and. unknown_devtyp .and. .not. -
     (session_mgr_login .or. session_detached_process)
$ endif
$!
$ if (mode .eqs. "INTERACTIVE") .and. .not. -
     (session_mgr_login .or. session_detached_process)
$ endif
$!
$! MicroVAX Support Removed from OpenVMS Alpha
$!
$! Place your site-specific LOGIN commands below
$!
$ !
$ ! Start a Child Controller process, name_num 3, child_num 3
$ !
$ CHILD :== $CONSOLE$IMAGE:CONSOLE$DAEMON.EXE
$ CHILD "child" 3
POLYCENTER Console Manager
Console Controller Daemon Version V1.6-100
Copyright (c) 1995 Digital Equipment Corporation. All Rights Reserved

Read error on Local socket CONSOLE_CTRL_NETBR4
Read error on Local socket CONSOLE_CTRL_NETBR3
%SYSTEM-F-ACCVIO, access violation, reason mask=04, virtual address=00000000,
PC=0005016C, PS=0000001B

  Improperly handled condition, image exit forced.
    Signal arguments:   Number = 00000005
                        Name   = 0000000C
                                 00000004
                                 00000000
                                 0005016C
                                 0000001B

    Register dump:
    R0  = 0000000000000001  R1  = 0000000000000008  R2  = 00000000000124B0
    R3  = 00000000001B0D04  R4  = 00000000001B0354  R5  = 00000000000301D0
    R6  = 00000000001B0354  R7  = 0000000000061ADA  R8  = 0000000000000003
    R9  = 000000007FF9C410  R10 = 000000007FF9D198  R11 = 000000007FFBE3E0
    R12 = 0000000000000000  R13 = FFFFFFFF8083C3A8  R14 = 0000000000000000
    R15 = 0000000500000000  R16 = 0000000000000001  R17 = 0000000000000000
    R18 = 0000000000000000  R19 = 00000000001C2984  R20 = 0000000000012623
    R21 = 000000007FB86828  R22 = 15A8002D40800001  R23 = 001D2170F00D0000
    R24 = 00000000F00D0000  R25 = 0000000000000001  R26 = FFFFFFFF80075DA0
    R27 = FFFFFFFF80838120  R28 = 0000000000050140  R29 = 000000007F959060
    SP  = 000000007F959060  PC  = 000000000005016C  PS  = 200000000000001B
  SYSTEM       job terminated at 10-JUL-1995 20:04:37.76

  Accounting information:
  Buffered I/O count:          198298         Peak working set size:   4512
  Direct I/O count:            103082         Peak page file size:    19520
  Page faults:                   2982         Mounted volumes:            0
  Charged CPU time:           0 00:07:15.04   Elapsed time:     4 04:21:29.38





Status after the Console Ctrl 03 was crashed.

CONS STAT/ALL
=============

    ------SYSTEM------ ---PID--- STATE -BYTES- -LINES- EVENTS ------USER------
1   ALFA01             000001DB   LYE   105.5K    2.2K      8
2   BCCNO3             000001DB   LYE      419      10    419
3   BDCNCC             000001DB   LYE   138.6K    3.5K     32
4   BOPRO1             000001DB   LYE     9.9K     249     10
5   BRSSAP             000001DB   LYE   199.9K    4.7K      8
6   BRUOA2             000001DB   LYE   281.3K    8.3K     30
7   CDVMV1             000001DB   LYE    6.52M  173.5K   3.5K
8   COV01              000001DB   LYE    91.4K    2.3K      4
9   COV02              000001DB   LYE    9.11M  205.2K   1.6K COV_OPER
10  COV03              000001DB   LYE    3.96M   91.0K   4.7K COV_OPER
11  COV04              000001DB   LYE    8.80M  184.8K    766 COV_OPER
12  COV05              000001DB   LYE     2.0K      39      6 COV_OPER
13  C_BOSNCC           000001DB   LYE    32.9K    1.2K     11
14  C_FNXFE2           000001DB   LYE    39.7K     870     17
15  C_RCTNCC           000001DB   LN-      289       6      1 NCC_OPER
16  DEVNCC             000001DB   LYE    11.0K     367     16
17  EBW                00000000   LN-        0       0      0
18  FE_C               000001DC   LYE    6.02M     208      4
19  FE_G               000001DC   LYE        0       0      0
20  FE_M               000001DC   LYE    4.65M     869      8
21  FE_W               000001DC   LYE    5.29M     178      4
22  FE_Z               000001DC   LYE        0       0      0
23  FNXFE3             000001DC   LYE        0       0      0
24  FNXFE5             000001DC   LYE        0       0      0
25  HSC000             000001DC   LYE      270      11      1
26  HSC001             000001DC   LYE        0       0      0
27  HSJ02              000001DC   LYE        0       0      0
28  HSJ04              000001DC   LYE        0       0      0
29  HSJ07              000001DC   LYE        0       0      0
30  HSJ08              000001DC   LYE        0       0      0
31  INFOSE             000001DC   LYE        0       0      0
32  NETBR1             000001DC   LYE   127.5K    2.2K     28
33  NETBR2             000001DD   LYE   157.7K    2.5K     11
34  NETBR3             000001DD   LYE    85.9K    1.8K     33
35  NETBR4             000001DD   LYE   453.0K   13.2K     51
36  PCMAXA             00000000   LN-        0       0      0
37  PCMAXB             000001DD   PYE    70.0K    2.1K   1.0K
38  R204A              000001DD   LYE        0       0      0
39  SWIFTE             000001DD   LYE    2.98M   92.1K      0
40  SWIFTQ             000001DD   LYE        0       0      0
41  SWIFTR             000001DD   LYE    3.23M   86.2K      2
42  XSERVA             000001DD   LYE     5.8K     137      4
43  XSERVB             00000000   LN-        0       0      0

CONS STAT/SYSTEM=NETBR4
=======================

System ............: NETBR4
Enabled ...........: Yes
Line status .......: OK

In use by .........:

Parent pid ........: 000001DD
Child Index .......: 3
Connection type ...: LAT

Logging device ....: 58% Full
Last Archive ......: None Performed

Lines of data .....: 13.2K  Bytes: 453.0K
Event total .......: 51

PCMAXB_SYSTEM> : 0  Min: 3  Warn: 5  Clr: 43  Ind: 0

CONS STAT
=========

                       POLYCENTER Console Manager Summary
                                     Totals


Configured Systems:  43 User disabled:   4
Active Systems    :  39 (D:000 P:001 L:042 T:000)   Unreachable: 000
Active Users      :   1 (Connect/Monitor: 000 C3: 001 Event sources: 003)

CM pid ........: 000001D9 V1.6-100         Uptime:   4 16:32:52
ENS pid .......: 000001DA V1.6-100         Uptime:   4 16:32:52

Total bytes ...: 52.44M       (0)         Ave bps:   129.43
Total lines ...: 881.6K       (0)         Ave lpm:   130.55
Total events ..: 12261        (0)         Ave epm:     1.82
Total actions .: 1500         (0)         Ave aph:    13.33
Active actions : 6                 Failed actions : 19

   Crit: 514  Maj: 662  Min: 7863  Warn: 2437  Clr: 785  Ind: 0



	He forgot to copy(save) the daemon logfile . Sorry.

	We re-started consolemanager without console$daemon.exe 
	and console$control.exe installed in memory.
	Were will the dump be created ?
	Do I have to open an IPMT (this is a serious problem for the
	customer in his production environment) ?
	Is there anything else that you need about logfiles,etc...

	Thanks for your help sofar.

	Rgds,

	Geert
856.4ZEDAR::simonSimon Jackson 830 x3879Tue Jul 11 1995 21:056
Geert,
       please do log an IPMT. We are in the middle of transferring
the support of PCM to a group in Israel, so we need to make sure
problems are tracked.

Cheers Simon...
856.5CSC32::BUTTERWORTHGun Control is a steady hand.Wed Jul 12 1995 02:094
    The dump *should* be in the CONSOLE$TMP directory.
    
    REgs,
       Dan
856.6PCM support in Israel ? And engineering ?BACHUS::WILLEMSGGeert Willems MCS-BelgiumWed Jul 12 1995 11:4712
    
    
    Hi Simon,
    
    	Israel ????
    
    	Who will do the support ?
    	Do you move to Israel ? What's going on guys ?
    
    Rgds,
    
    Geert
856.7OPG::PHILIPAnd through the square window...Wed Jul 12 1995 13:2527
Geert and anybody else that is interested....

  Product support and maintenance is moving to Israel,
  like within the next two weeks.

  Simon and I dont work for Engineering, we work for GPS
  (was OMS, IM&T, I.S or whatever). The work we do on PCM
  was funded by NSM Engineering.

  Due to the supposedly imminent release of the PEM product
  no further PCM major functional enhancements are planned
  (this may change in the future) because of this it is
  cheaper for NSM to have PCM support done from Israel.
  This is a move we fully support as it is more cost effective
  for Digital.

  As a result of all this, there is a possibility that Simon
  and I will be leaving Digital in the near future as we really
  dont have anything more to contibute to the companys bottom
  line.

  If you have any questions about this transition or the future
  of PCM, please talk to product management as they will be able
  to align this move with NSM's future strategy.

Cheers,
Phil
856.8names of future PCM support people ?BACHUS::WILLEMSGGeert Willems MCS-BelgiumTue Jul 18 1995 12:0711
    
    Hi,
    
    Still waiting on a dump from the customer.
    
    Phil, can you give me the names of the PCM support persons
    in Israel ?
    
    Thanks & Rgds,
    
    Geert 
856.954625::WILLEMSJohan Willems @BRO DTN 856-8739Wed Aug 09 1995 11:5011
	Dan,

	The controller process crashed again. All information has been saved in
	a saveset. The failing controller process DID NOT create a dump file
	although the daemon image was not installed.

	I will open an IPMT case for Geert (who is ill for the moment)

	Kind regards,

	Johan
856.1029067::BUTTERWORTHGun Control is a steady hand.Wed Aug 09 1995 17:293
    Where is the saveset?
    Regs,
      Dan
856.11Saveset location54625::WILLEMSJohan Willems @BRO DTN 856-8739Thu Aug 10 1995 09:279
	Dan,

	The saveset is available at :

		BRSDVP::PCMCRASH090895.BCK

	Kind regards,

	Johan
856.1229067::BUTTERWORTHGun Control is a steady hand.Fri Aug 11 1995 16:584
    I'm copying it now.
    
    Regs,
      Dan
856.1329067::BUTTERWORTHGun Control is a steady hand.Fri Aug 11 1995 23:1322
    Johan,
      I have analyzed the available data and the control 1 process had
    socket read errors on each socket and then died trying to remove an 
    entry from an internal queue. I cannot tell what this queue was without
    a process dump. What we need to do is
    
    DEFINE/SYSTEM CONSOLE$DEBUG TRUE
    
    and then comment out the following line the CONSOLE$STARTUP.COM
    
    by placing an ! in front i.e., make it look like the below but note
    that I didn't include all of the line:
    
    !Install ADD Console$image:Console$Daemon.exe  .............
    
    
    Now restart the software and we should get a process dump when it
    happens again.
    
    
    Regards,
        dan
856.14no dump was created54625::WILLEMSGGeert Willems MCS-BelgiumWed Aug 16 1995 08:4521
    
    Hi Dan,
    
    I'm back. I just spoke the customer. We already did the 
    $!!!Install ADD Console$Image:console$Daemon.exe             /Open/Share/
    But no dump was created in the console$tmp ! Why ?
    
    Do we need to define the console$debug logical also to have this dump
    file ? I thought that the $!!!Install ADD Console$Image:console$Daemon.exe
    was enough .
    
    For the moment they have around 35/40 systems connected and between
    the last time and before last time we had more then a month time
    difference. This is a long time to activate console$debug. What are the
    things to look for, because sometimes there is a lot of traffic ...
    We are dealing with a huge banking environment.	
    
    
    Thanks & Rgds,
    
    Geert
856.1529067::BUTTERWORTHGun Control is a steady hand.Thu Aug 17 1995 17:1416
>    I'm back. I just spoke the customer. We already did the 
>    $!!!Install ADD Console$Image:console$Daemon.exe             /Open/Share/
>    But no dump was created in the console$tmp ! Why ?
    
    Well I checked the code and the flag is not set on the $CREPRC call!
    
    What I wll have to do is patch it. Even if it's an AXP I can use
    a VAX to patch/abosolute the AXP image.
    
    By doing this, we won't have to turn on debug!
    
    I'll  patch it and copy it to BACHUS and send mail.
    
    Regs,
      Dan
    
856.16another customer has the same problem (VAX)54625::WILLEMSGGeert Willems MCS-BelgiumFri Aug 18 1995 07:1982

Hi Dan,

I have another customer who has also the CONSOLE CTRL crash problem.
This time it's on a VAX. I will ask to install the ECO kit, but we know
that this doesn't solve the problem !

Rgds,

Geert

This message is what I have got from the customer :

Today we remarked the disappearance of a console controller process "Console
Ctrl 01 ". The process dumped last friday.

In the logfile  CONTROLLER_02.LOG  we see:

$ set noverify
POLYCENTER Console Manager
Console Controller Daemon Version V1.6-100
Copyright (c) 1995 Digital Equipment Corporation. All Rights Reserved

Read error on Local socket CONSOLE_CTRL_SUNDEV
Read error on Local socket CONSOLE_CTRL_SUNPR1
Read error on Local socket CONSOLE_CTRL_SUNUK
Read error on Local socket CONSOLE_CTRL_SUNDEV
Read error on Local socket CONSOLE_CTRL_SUNPRO
Read error on Local socket CONSOLE_CTRL_SUNDEV
Read error on Local socket CONSOLE_CTRL_SUNDEV
%SYSTEM-F-ACCVIO, access violation, reason mask=01, virtual address=38313029,
PC=00052C88, PSL=03C00000

  Improperly handled condition, image exit forced.

    Signal arguments          Stack contents

    Number = 00000005        00000000
    Name   = 0000000C        00000000
         00000001        200C0000
         38313029        7FE9BDB4
         00052C88        7FE9BDA0
         03C00000        000B6633
                     00210420
                     00001420
                     00000001
                     38313030

    Register dump

    R0 = 38313020  R1 = 827EDAD0  R2 = 00210420  R3 = 0006F608
    R4 = 7FE9BD00  R5 = 7FFE5EBC  R6 = 00000000  R7 = 00000001
    R8 = 7FFECA48  R9 = 7FFECC50  R10= 7FFED7D4  R11= 7FFE2BDC
    AP = 7FE9BD38  FP = 7FE9BCF8  SP = 7FE9BD74  PC = 00052C88
    PSL= 03C00000

  SYSTEM       job terminated at 11-AUG-1995 20:34:24.90
  Accounting information:
  Buffered I/O count:           18049         Peak working set size:    2562
  Direct I/O count:             11973         Peak page file size:      7314
  Page faults:                  13604         Mounted volumes:             0
  Charged CPU time:           0 00:09:03.48   Elapsed time:     0 04:20:16.83



When restarting PCM completely we have again some consoles that become 
unreachable ( 6 gray icons in the c3 interface  : ITSHAM, ITSHAL, ITSFE1, 
ITSFE2, SABLE1 & SABLE2 ).

In CONTROLLER_02.LOG;1 we remark :

$ set noverify
POLYCENTER Console Manager
Console Controller Daemon Version V1.6-100
Copyright (c) 1995 Digital Equipment Corporation. All Rights Reserved

Read error on Local socket CONSOLE_CTRL_SUNCH
Read error on Local socket CONSOLE_CTRL_SUNDEV
Read error on Local socket CONSOLE_CTRL_SUNPRO
Read error on Local socket CONSOLE_CTRL_SUNUK2

856.1754625::WILLEMSGGeert Willems MCS-BelgiumFri Aug 18 1995 13:3565

	Hi Dan,

	It happened a second time with the other customer(VAX).
	So, I will need the image for AXP and for VAX.
	This feedback he gave me :

------------------------------------------------------------------------------
When doing a sho sys I only find following processes ...

000000A4 Console Daemon  HIB      6     5080   0 00:00:06.57      9525    274
000000A7 Console Notify  HIB      6     1099   0 00:00:05.65     15101    164
000000AA Console Ctrl 01 HIB      8    18740   0 00:01:43.80     40241    592

The console$tmp:CONTROLLER_02.LOG;1 file looks like this :

$ set noverify
POLYCENTER Console Manager
Console Controller Daemon Version V1.6-100
Copyright (c) 1995 Digital Equipment Corporation. All Rights Reserved

Read error on Local socket CONSOLE_CTRL_SUNDEV
Read error on Local socket CONSOLE_CTRL_SUNPRO
%SYSTEM-F-ACCVIO, access violation, reason mask=01, virtual address=38313029, PC
=00052C88, PSL=03C00000

  Improperly handled condition, image exit forced.

   Signal arguments       Stack contents

   Number = 00000005       00000000
   Name   = 0000000C       00000000
        00000001         200C0000
       38313029         7FE9BDB4
        00052C88         7FE9BDA0
        03C00000         000B6633 
                 00220E08
                 00001420
                 00000001
                 38313030

   Register dump

   R0 = 38313020  R1 = 8287CB90  R2 = 00220E08  R3 = 0006F608
   R4 = 7FE9BD00  R5 = 7FFE5EBC  R6 = 00000000  R7 = 00000001
   R8 = 7FFECA48  R9 = 7FFECC50  R10= 7FFED7D4  R11= 7FFE2BDC
   AP = 7FE9BD38  FP = 7FE9BCF8  SP = 7FE9BD74  PC = 00052C88
   PSL= 03C00000

  SYSTEM       job terminated at 18-AUG-1995 12:25:27.52

  Accounting information:
  Buffered I/O count:            3104         Peak working set size:    2540
  Direct I/O count:              5026         Peak page file size:      7314
  Page faults:                  12556         Mounted volumes:             0
  Charged CPU time:           0 00:03:08.29   Elapsed time:     0 01:50:37.93

 It does not look good, does it ????


P.S. Why do we have those problems (now) ??? Was it perhaps because PCM was 
archiving ??? 
-------------------------------------------------------------------------------

856.18are patches images ready ?54625::WILLEMSGGeert Willems MCS-BelgiumThu Sep 21 1995 08:5625
    
    
    Hi Dan,
    
    After 5 weeks absence, I'm back at the office.
    The last time you replied, you were working on the $CREPRC(debug flag).
    Do you have the patched images (for VAX and AXP) ?
    Johan Willems did the follow up in my absence, but he didn't had any
    feedback.
    
    Dan, do you still have the PCMCRASH090895.BCK file that you copied.
    I need this, because Johan's file is already deleted and PCM
    enigineering Israel needs this to analyze the problem. Can you copy it
    to BRSDVP"":: .
    Will PCM enigineering Israel find something in this PCMCRASH090895.BCK
    file, because it only contains logfiles and no dump file ?
    What's you're advise ?
    
    Thanks for the help.
    
    Rgds,
    
    Geert