[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference noted::hackers_v1

Title:	-={ H A C K E R S }=-
Notice:	Write locked - see NOTED::HACKERS
Moderator:	DIEHRD::MORRIS

Created:	Thu Feb 20 1986
Last Modified:	Mon Aug 03 1992
Last Successful Update:	Fri Jun 06 1997
Number of topics:	680
Total number of notes:	5456

482.0. "Sequential version > indexed?" by PENNSY::MCMAHON (AARRGGHH! Thank you.) Tue May 26 1987 20:23

If this isn't the appropriate conference for this, let me know and I'll
delete it.
    
    I am converting an indexed file to sequential. The screwy part is
    that the sequential version is about 50% larger than the indexed
    one. In my experience, the sequential version is always noticeably
    smaller than the indexed. By looking at the FDL and/or DIR/FULLs
    of the files, does anyone have an explanation? Here is the com
    stream I use to do the convert:
    
    CONVERT/SHARE/FAST_LOAD/NOSORT/STATISTICS/FDL=PURORD.FDL PURORD.IDX
        PURORD.DAT
    
    Thanks.
    
    Pat
    
PURORD.IDX;20                 File ID:  (2546,1,0)         
Size:        52560/52560      Owner:    [MAAPSTRAN]
Created:   4-JAN-1987 21:57   Revised:  21-MAY-1987 16:18 (9592)
Expires:   <None specified>   Backup:    <No backup done>
File organization:  Indexed, Prolog: 3, Using 1 key
File attributes:    Allocation: 52560, Extend: 0, Maximum bucket size: 2, Global buffer count: 0, No version limit
Record format:      Fixed length 192 byte records
Record attributes:  Carriage return carriage control
Journaling enabled: None
File protection:    System:RWED, Owner:RWED, Group:RWED, World:

===============================================================================

SYSTEM
	SOURCE                  VAX/VMS

FILE
	ALLOCATION              20000
	BEST_TRY_CONTIGUOUS     yes
	CLUSTER_SIZE            5
	EXTENSION               0
	GLOBAL_BUFFER_COUNT     0
	ORGANIZATION            sequential
	PROTECTION              (system:RWE, owner:RWED, group:RWED, world:)

RECORD
	BLOCK_SPAN              yes
	CARRIAGE_CONTROL        carriage_return
	FORMAT                  fixed
	SIZE                    192

===============================================================================

PURORD.DAT;1                  File ID:  (1856,18,0)        
Size:        70150/70150      Owner:    [REF_DATA]
Created:  21-MAY-1987 09:28   Revised:  21-MAY-1987 10:35 (1)
Expires:   <None specified>   Backup:    <No backup done>
File organization:  Sequential
File attributes:    Allocation: 70150, Extend: 0, Global buffer count: 0, No version limit
Record format:      Fixed length 192 byte records
Record attributes:  Carriage return carriage control
Journaling enabled: None
File protection:    System:RWE, Owner:RWED, Group:RWED, World:

T.R	Title	User	Personal Name	Date	Lines
482.1	RMS compresses keys and data	FROST::HARRIMAN	entente, enunciation	`Tue May 26 1987 20:48`	19
	why the CLUSTER_SIZE 5? Normal RMS disks have cluster size of 3 which defines the amount of blocks which are i'ed or o'ed in a single i/o operation. If you have a file defined with a cluster size of 5 on a cluster size 3 disk you are wasting blocks (as well as being illegal though I've never tried it). Did you get the FDL from ANALYZE/RMS/FDL? Otherwise: I don't see an ANALYZE/RMS/FDL of the indexed file. RMS does some nifty compression on both indices and data records if you have it specified (which is the default). This makes for a tad more compute time but saves lots of spaces especially if your record consists of lots of spaces (or zeroes). This is probably the real reason here... send an analyze(d)/RMS/FDL of it. /pjh
482.2	Compression	BISTRO::HEIN	If We don't Have it,You don't Need it!	`Wed May 27 1987 09:39`	24
	.1> why the CLUSTER_SIZE 5? Normal RMS disks have cluster size of .1> 3 which defines the amount of blocks which are i'ed or o'ed in a .1> single i/o operation. If you have a file defined with a cluster .1> size of 5 on a cluster size 3 disk you are wasting blocks (as well .1> as being illegal though I've never tried it). Nonsense! Clustersize only determines the allocation granularity. The only 'waist' can happen at the end of file. For example with a clustersize of 3 you can only allocate 20001 blocks, not 20000. Re .0 The compression must have been very effective. This is not a big supprise as fixed length records are involved. To find out what really happened run ANA/RMS/STAT/FDL/OUT on the indexed file. Look at the numbers for compression, total_file_efficiency, count of datablocks, fill_factor. The /FAST and /NOSORT switch have no meaning for the command you show but I suppose they won't hurt either. What will have hurt you is the lack of a significant (5000+) extension quantity in the FDL for sequential file.
482.3	oops, I guess.	FROST::HARRIMAN	entente, enunciation	`Wed May 27 1987 13:04`	11
	I sit corrected, Hein. I was under the mistaken impression that the cluster size of a disk was equivalent to the cluster_size statement on the FDL editor. It certainly sticks the disk's cluster size there. I was told at RMS school that the cluster size on a disk is (paraphrased, now) the amount of disk blocks transferred in a single I/O operation. This is organizational only and the blocks end up contiguous. Back to the manuals for me. /pjh
482.4		CLOUD::SHIRRON	Stephen F. Shirron, 223-3198	`Wed May 27 1987 14:21`	13
	> I was told at RMS school that the cluster size on a disk is > (paraphrased, now) the amount of disk blocks transferred in a single > I/O operation. Nonsense. The cluster size is merely the smallest number of blocks which can be allocated -- a cluster size of 3 means that each bit in the storage bitmap corresponds to 3 blocks, and requests for 1, 2, or 3 blocks will all be satisfied with an allocated chunk of 3 blocks. This has nothing to do with I/O sizes; the number of blocks requested in an I/O can be smaller or larger than the cluster size. stephen
482.5		ALBANY::KOZAKIEWICZ	You can call me Al...	`Wed May 27 1987 17:05`	5
	Data compression is the culprit - I once spent the better part of a day (I wuz raised on RMS-11, so I didn't know better...) tracking down a non-problem. My fixed length sequential file of about 20,000 blocks shrunk to about 4500 or so. The recods contained a key and about 50 bytes of zeros....
482.6	learn something over every day	FROST::HARRIMAN	entente, enunciation	`Wed May 27 1987 20:21`	5
	re: .-2 Ugh. Back to the books.
482.7	Here's the FDL from ANALYZE	NUHAVN::MCMAHON	AARRGGHH! Thank you.	`Wed May 27 1987 20:47`	63
	IDENT "27-MAY-1987 12:37:40 VAX/VMS ANALYZE/RMS_FILE Utility" SYSTEM SOURCE VAX/VMS FILE ALLOCATION 52890 BEST_TRY_CONTIGUOUS no BUCKET_SIZE 2 CLUSTER_SIZE 5 CONTIGUOUS no EXTENSION 0 GLOBAL_BUFFER_COUNT 0 NAME "PURORD.IDX;20" ORGANIZATION indexed OWNER [102,60] PROTECTION (system:RWED, owner:RWED, group:RWED, world:) RECORD BLOCK_SPAN yes CARRIAGE_CONTROL carriage_return FORMAT fixed SIZE 192 AREA 0 ALLOCATION 52890 BUCKET_SIZE 2 EXTENSION 0 KEY 0 CHANGES no DATA_KEY_COMPRESSION yes DATA_RECORD_COMPRESSION yes DATA_AREA 0 DATA_FILL 100 DUPLICATES no INDEX_AREA 0 INDEX_COMPRESSION yes INDEX_FILL 100 LEVEL1_INDEX_AREA 0 NAME "PO-MASTER-KEY" NULL_KEY no PROLOG 3 SEG0_LENGTH 17 SEG0_POSITION 0 TYPE string ANALYSIS_OF_AREA 0 RECLAIMED_SPACE 0 ANALYSIS_OF_KEY 0 DATA_FILL 83 DATA_KEY_COMPRESSION 61 DATA_RECORD_COMPRESSION 57 DATA_RECORD_COUNT 189090 DATA_SPACE_OCCUPIED 41616 DEPTH 3 INDEX_COMPRESSION 38 INDEX_FILL 71 INDEX_SPACE_OCCUPIED 692 LEVEL1_RECORD_COUNT 20808 MEAN_DATA_LENGTH 192 MEAN_INDEX_LENGTH 19
482.8	Huh!	FROST::HARRIMAN	exclamations...exaggerations	`Wed May 27 1987 21:19`	19
	> ANALYSIS_OF_KEY 0 > DATA_FILL 83 > DATA_KEY_COMPRESSION 61 > DATA_RECORD_COMPRESSION 57 > DATA_RECORD_COUNT 189090 > DATA_SPACE_OCCUPIED 41616 > DEPTH 3 > INDEX_COMPRESSION 38 > INDEX_FILL 71 > INDEX_SPACE_OCCUPIED 692 > LEVEL1_RECORD_COUNT 20808 > MEAN_DATA_LENGTH 192 > MEAN_INDEX_LENGTH 19 I believe this is called a moot point.
482.9	Black magic?	BISTRO::HEIN	If We don't Have it,You don't Need it!	`Thu May 28 1987 09:19`	46
	DatFil DATA_FILL 83 KeySiz = 17 KeyCmp DATA_KEY_COMPRESSION 61 Bucket = 2 RecCmp DATA_RECORD_COMPRESSION 57 RecCnt DATA_RECORD_COUNT 189090 DatBlk DATA_SPACE_OCCUPIED 41616 DEPTH 3 INDEX_COMPRESSION 38 INDEX_FILL 71 INDEX_SPACE_OCCUPIED 692 LEVEL1_RECORD_COUNT 20808 RecSiz MEAN_DATA_LENGTH 192 MEAN_INDEX_LENGTH 19 Well, that proves it, doesn't it! Oh and congrats for having shown me what must be the worst designed file of this year. Not suprising RMS get's to blame for poor performance with file designs like this. The room you would need for a sequential file is: (DATA_RECORD_COUNT(MEAN_DATA_LENGTH (+ 2 if variable + 1 if ODD))/512 = 189090192/512 = 70908 Blocks. This is confirmed be the compression statistics. According to those, working bottom up, the room needed is: Real_Compression = KeyCmpKeySiz/RecSiz + (RecSiz-KeySiz)RecCmp/RecSiz 6117/192 + (192-17)57/192 = 57.4 Compressed_blocks= ((DatBlk/Bucket) * (Bucket512-15) (DatFil/100) - (RecCnt*11) / 512 = 29972 Real data blocks = Compressed_blocks / ( 1 - Real_compression/100 ) 29972 / ( 1 - 0.574 ) = 70359 !Close enough? As for the file design: Here is a small table calculated using 100% fill factors showing the required # buckets. Bucketsize Data_buckets Index Level_1 Level_2 Level_3 Blocks 2 17190 325 7 1 666 4 8222 77 1 316 40 802 1 40 Hope this helps, Hein van den Heuvel, Valbonne.