[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference noted::hackers_v1

Title:-={ H A C K E R S }=-
Notice:Write locked - see NOTED::HACKERS
Moderator:DIEHRD::MORRIS
Created:Thu Feb 20 1986
Last Modified:Mon Aug 03 1992
Last Successful Update:Fri Jun 06 1997
Number of topics:680
Total number of notes:5456

482.0. "Sequential version > indexed?" by PENNSY::MCMAHON (AARRGGHH! Thank you.) Tue May 26 1987 20:23

If this isn't the appropriate conference for this, let me know and I'll
delete it.
    
    I am converting an indexed file to sequential. The screwy part is
    that the sequential version is about 50% larger than the indexed
    one. In my experience, the sequential version is always noticeably
    smaller than the indexed. By looking at the FDL and/or DIR/FULLs
    of the files, does anyone have an explanation? Here is the com
    stream I use to do the convert:
    
    CONVERT/SHARE/FAST_LOAD/NOSORT/STATISTICS/FDL=PURORD.FDL PURORD.IDX
        PURORD.DAT
    
    Thanks.
    
    Pat
    
PURORD.IDX;20                 File ID:  (2546,1,0)         
Size:        52560/52560      Owner:    [MAAPSTRAN]
Created:   4-JAN-1987 21:57   Revised:  21-MAY-1987 16:18 (9592)
Expires:   <None specified>   Backup:    <No backup done>
File organization:  Indexed, Prolog: 3, Using 1 key
File attributes:    Allocation: 52560, Extend: 0, Maximum bucket size: 2, Global buffer count: 0, No version limit
Record format:      Fixed length 192 byte records
Record attributes:  Carriage return carriage control
Journaling enabled: None
File protection:    System:RWED, Owner:RWED, Group:RWED, World:

===============================================================================

SYSTEM
	SOURCE                  VAX/VMS

FILE
	ALLOCATION              20000
	BEST_TRY_CONTIGUOUS     yes
	CLUSTER_SIZE            5
	EXTENSION               0
	GLOBAL_BUFFER_COUNT     0
	ORGANIZATION            sequential
	PROTECTION              (system:RWE, owner:RWED, group:RWED, world:)

RECORD
	BLOCK_SPAN              yes
	CARRIAGE_CONTROL        carriage_return
	FORMAT                  fixed
	SIZE                    192

===============================================================================

PURORD.DAT;1                  File ID:  (1856,18,0)        
Size:        70150/70150      Owner:    [REF_DATA]
Created:  21-MAY-1987 09:28   Revised:  21-MAY-1987 10:35 (1)
Expires:   <None specified>   Backup:    <No backup done>
File organization:  Sequential
File attributes:    Allocation: 70150, Extend: 0, Global buffer count: 0, No version limit
Record format:      Fixed length 192 byte records
Record attributes:  Carriage return carriage control
Journaling enabled: None
File protection:    System:RWE, Owner:RWED, Group:RWED, World:


T.RTitleUserPersonal
Name
DateLines
482.1RMS compresses keys and dataFROST::HARRIMANentente, enunciationTue May 26 1987 20:4819
        why the CLUSTER_SIZE 5? Normal RMS disks have cluster size of
    3 which defines the amount of blocks which are i'ed or o'ed in a
    single i/o operation. If you have a file defined with a cluster
    size of 5 on a cluster size 3 disk you are wasting blocks (as well
    as being illegal though I've never tried it).
    
    Did you get the FDL from ANALYZE/RMS/FDL? 
    
    Otherwise:
    
       I don't see an ANALYZE/RMS/FDL of the indexed file. RMS does
    some nifty compression on both indices and data records if you have
    it specified (which is the default). This makes for a tad more compute
    time but saves lots of spaces especially if your record consists
    of lots of spaces (or zeroes). This is probably the real reason
    here... send an analyze(d)/RMS/FDL of it.
    
    /pjh
482.2CompressionBISTRO::HEINIf We don't Have it,You don't Need it!Wed May 27 1987 09:3924
.1>        why the CLUSTER_SIZE 5? Normal RMS disks have cluster size of
.1>    3 which defines the amount of blocks which are i'ed or o'ed in a
.1>    single i/o operation. If you have a file defined with a cluster
.1>    size of 5 on a cluster size 3 disk you are wasting blocks (as well
.1>    as being illegal though I've never tried it).

    Nonsense! Clustersize only determines the allocation granularity.
    The only 'waist' can happen at the end of file. For example with
    a clustersize of 3 you can only allocate 20001 blocks, not 20000.
    
Re .0
         
    	The compression must have been very effective. This is not
    	a big supprise as fixed length records are involved.
    
    	To find out what really happened run ANA/RMS/STAT/FDL/OUT
    	on the indexed file. Look at the numbers for compression,
 	total_file_efficiency, count of datablocks, fill_factor.
    
    	The /FAST and /NOSORT switch have no meaning for the command
    	you show but I suppose they won't hurt either.
    
    	What will have hurt you is the lack of a significant (5000+)
    	extension quantity in the FDL for sequential file.
482.3oops, I guess.FROST::HARRIMANentente, enunciationWed May 27 1987 13:0411
    
    I sit corrected, Hein. I was under the mistaken impression that
    the cluster size of a disk was equivalent to the cluster_size statement
    on the FDL editor. It certainly sticks the disk's cluster size there.
    I was told at RMS school that the cluster size on a disk is
    (paraphrased, now) the amount of disk blocks transferred in a single
    I/O operation. This is organizational only and the blocks end up
    contiguous. Back to the manuals for me.
    
    /pjh
    
482.4CLOUD::SHIRRONStephen F. Shirron, 223-3198Wed May 27 1987 14:2113
    > I was told at RMS school that the cluster size on a disk is
    > (paraphrased, now) the amount of disk blocks transferred in a single
    > I/O operation.
    
    Nonsense.  The cluster size is merely the smallest number of blocks
    which can be allocated -- a cluster size of 3 means that each bit
    in the storage bitmap corresponds to 3 blocks, and requests for
    1, 2, or 3 blocks will all be satisfied with an allocated chunk
    of 3 blocks.  This has nothing to do with I/O sizes; the number
    of blocks requested in an I/O can be smaller or larger than the
    cluster size.
    
    stephen
482.5ALBANY::KOZAKIEWICZYou can call me Al...Wed May 27 1987 17:055
Data compression is the culprit - I once spent the better part of a day
(I wuz raised on RMS-11, so I didn't know better...) tracking down a
non-problem.  My fixed length sequential file of about 20,000 blocks
shrunk to about 4500 or so. The recods contained a key and about 50 bytes
of zeros....
482.6learn something over every dayFROST::HARRIMANentente, enunciationWed May 27 1987 20:215
    re: .-2
    
    Ugh. Back to the books. 
    
    
482.7Here's the FDL from ANALYZENUHAVN::MCMAHONAARRGGHH! Thank you.Wed May 27 1987 20:4763
IDENT	"27-MAY-1987 12:37:40	VAX/VMS ANALYZE/RMS_FILE Utility"

SYSTEM
	SOURCE                  VAX/VMS

FILE
	ALLOCATION              52890
	BEST_TRY_CONTIGUOUS     no
	BUCKET_SIZE             2
	CLUSTER_SIZE            5
	CONTIGUOUS              no
	EXTENSION               0
	GLOBAL_BUFFER_COUNT     0
	NAME                    "PURORD.IDX;20"
	ORGANIZATION            indexed
	OWNER                   [102,60]
	PROTECTION              (system:RWED, owner:RWED, group:RWED, world:)

RECORD
	BLOCK_SPAN              yes
	CARRIAGE_CONTROL        carriage_return
	FORMAT                  fixed
	SIZE                    192

AREA 0
	ALLOCATION              52890
	BUCKET_SIZE             2
	EXTENSION               0

KEY 0
	CHANGES                 no
	DATA_KEY_COMPRESSION    yes
	DATA_RECORD_COMPRESSION yes
	DATA_AREA               0
	DATA_FILL               100
	DUPLICATES              no
	INDEX_AREA              0
	INDEX_COMPRESSION       yes
	INDEX_FILL              100
	LEVEL1_INDEX_AREA       0
	NAME                    "PO-MASTER-KEY"
	NULL_KEY                no
	PROLOG                  3
	SEG0_LENGTH             17
	SEG0_POSITION           0
	TYPE                    string

ANALYSIS_OF_AREA 0
	RECLAIMED_SPACE         0

ANALYSIS_OF_KEY 0
	DATA_FILL               83
	DATA_KEY_COMPRESSION    61
	DATA_RECORD_COMPRESSION 57
	DATA_RECORD_COUNT       189090
	DATA_SPACE_OCCUPIED     41616
	DEPTH                   3
	INDEX_COMPRESSION       38
	INDEX_FILL              71
	INDEX_SPACE_OCCUPIED    692
	LEVEL1_RECORD_COUNT     20808
	MEAN_DATA_LENGTH        192
	MEAN_INDEX_LENGTH       19
482.8Huh!FROST::HARRIMANexclamations...exaggerationsWed May 27 1987 21:1919
    
    
 >   ANALYSIS_OF_KEY 0
 >	DATA_FILL               83
 >	DATA_KEY_COMPRESSION    61
 >	DATA_RECORD_COMPRESSION 57
 >	DATA_RECORD_COUNT       189090
 >	DATA_SPACE_OCCUPIED     41616
 >	DEPTH                   3
 >	INDEX_COMPRESSION       38
 >	INDEX_FILL              71
 >	INDEX_SPACE_OCCUPIED    692
 >	LEVEL1_RECORD_COUNT     20808
 >	MEAN_DATA_LENGTH        192
 >   	MEAN_INDEX_LENGTH       19
    
    I believe this is called a moot point. 
    

482.9Black magic?BISTRO::HEINIf We don't Have it,You don't Need it!Thu May 28 1987 09:1946
DatFil	DATA_FILL               83             KeySiz = 17
KeyCmp	DATA_KEY_COMPRESSION    61             Bucket = 2
RecCmp	DATA_RECORD_COMPRESSION 57
RecCnt	DATA_RECORD_COUNT       189090
DatBlk	DATA_SPACE_OCCUPIED     41616
	DEPTH                   3
	INDEX_COMPRESSION       38
	INDEX_FILL              71
	INDEX_SPACE_OCCUPIED    692
	LEVEL1_RECORD_COUNT     20808
RecSiz	MEAN_DATA_LENGTH        192
	MEAN_INDEX_LENGTH       19

    Well, that proves it, doesn't it! Oh and congrats for having shown
    me what must be the worst designed file of this year. Not suprising
    RMS get's to blame for poor performance with file designs like this.
    
    The room you would need for a sequential file is:
    
(DATA_RECORD_COUNT*(MEAN_DATA_LENGTH (+ 2 if variable + 1 if ODD))/512
    
    = 189090*192/512 = 70908 Blocks.
    
    This is confirmed be the compression statistics. According to those,
    working bottom up, the room needed is:
                                                      
    Real_Compression =	KeyCmp*KeySiz/RecSiz + (RecSiz-KeySiz)*RecCmp/RecSiz
    			61*17/192 + (192-17)*57/192 = 57.4
    Compressed_blocks=	((DatBlk/Bucket) * (Bucket*512-15) * (DatFil/100)
    			- (RecCnt*11) / 512 = 29972
    
    Real data blocks =	Compressed_blocks / ( 1 - Real_compression/100 )
    			29972 / ( 1 - 0.574 )   =   70359 !Close enough?
    
    
    As for the file design: Here is a small table calculated using 100%
    fill factors showing the required # buckets.
    
    Bucketsize	Data_buckets	Index	Level_1	Level_2	Level_3	Blocks
    
    2		17190			325	7	1	666
    4		8222			77	1               316
    40		802			1                       40 
    
    Hope this helps,
                    Hein van den Heuvel, Valbonne.