[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::dectrace_v20

Title:DECtrace V2.0 and All-in-1 Perf Rpts conf.
Notice:Kits+Doc, 2 | Patches, 3
Moderator:OMYGOD::LAVASH
Created:Mon Apr 26 1993
Last Modified:Mon Jun 02 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:467
Total number of notes:2058

456.0. "Collect monitor crashes cluster..." by M5::BLITTIN () Thu Mar 27 1997 20:44

    
    Ct running Trace 2.2 on vax/vms 6.1. Two node cluster. Raid array.
    Rdb 6.1. PC runs 'excursions' emulator.
    
    When the ct does a 'collect monitor' against a db data file and after
    about 5-10 minutes the swapper starts consumming about 30% cpu time
    and eventually locks users out.  The ct's terminal hangs.  The first
    time this lasted about 10-15 min, then apparently cleared up.  The
    ct then retried the same process with the same results, but this
    time it crashed the cluster. 
    
    No dump file that he could find.  I read an earlier note that 
    suggested using the /interval qualifier.  
    
    Things to look at...?
    
    Thank You
T.RTitleUserPersonal
Name
DateLines
456.1DUCATI::LASTOVICAIs it possible to be totally partial?Thu Mar 27 1997 21:224
>    time it crashed the cluster. 

	I'd suggest calling Digital for some VMS analysis of why the
cluster crashed.  I'd hate to think that it was collect.
456.2Active vs Static monitoring?M5::BLITTINFri Mar 28 1997 16:296
    re: .1 Ct will contact DEC.
    
    In the meantime.  Ct reran the monitor against a static file and
    everything seemed to run ok.  Since the problem occurred while the
    collection was active, does the monitor have any problem identifying
    the end of the active collection, if/when, it hits it?
456.3end of file information in a lock value blockOMYGOD::LAVASHSame as it ever was...Fri Mar 28 1997 17:3729
If you are monitoring a collection in progess you should really be using
the /interval qualifier.

If not you are looking at all kinds of bogus data.  

We have a 32K default cache that gets flushed when full.  If you don't use
a flush interval you can get "old" data on the flush, which makes looking
at it in real time pointless.

The flush interval keeps data flushed to disk at a regular interval which
keeps it all consistant for the monitor.

For static data we can pre-sort the records in the file and pick them off
as needed.

Monitor is actually 2 processes, 1 the data channel tries to stay at the
end of the .dat file, reading records in as fast as possible and updating
global sections that the monitor process reads from.

The data channel if it hits end of file will check the lock and lock value 
block for the file to see if any new data has come in.  Actually it may issue
a blocking ast to be automatically notified when the file contents have 
changed.  Can't remember exactly it's been about 5 years...

Anyway, they should use interval if they are doing on-line monitoring.

If that makes their problem go away then I'd say ignore the other problem.

George
456.4/flush=00:00:02M5::BLITTINFri Mar 28 1997 17:576
    
    They are using the /flush set to 00:00:02.
    
    I'm having him contact DEC to evaluate the crash dump...
    
    Thank you for the reply...
456.5couple things to tryOMYGOD::LAVASHSame as it ever was...Fri Mar 28 1997 19:3810
    Then again if it's a heavily loaded system and they are using the 2 second
    interval, perhaps all the concentrated writing is causing the problems...

    Have them change the flush interval to 5, and bump the monitoring interval
    to 5 or 10...

    See if that helps.  Or possibly they may need to tune some process/system
    parameters...

    George