[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference decwet::hsm-4-unix

Title:	HSM for UNIX Platforms
Notice:	Kit Info in note 2.1 -- Public Info Pointer in 3.1
Moderator:	DECWET::TRESSEL

Created:	Fri Jul 08 1994
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	238
Total number of notes:	998

231.0. "backup HSM cache ?" by GIDDAY::SCHWARZ () Tue Mar 04 1997 00:00

    G'day,
    
    Please excuse me if this is a really simple question.
    
    I have a custome who is running D.unix 3.2g and HSM 1.2. They have a
    cache of files on an rz28 set up as an efs ( entended file system).
    This magnetic disk is reporting bad blocks and we wish to replace it.
    What can we use to back up the data before we replace the disk? Will
    standard vdump work?  The customer is not sure if this will work and is
    suggesting using dd .
    
    Any pointers etc would be greatly appreciated.
    
    Regards
    
    Kym Schwarz
    Unix Support
    CSC Sydney

T.R	Title	User	Personal Name	Date	Lines
231.1	use hmdump / hmrestore	DECWET::TRESSEL	Pat Tressel	`Tue Mar 04 1997 16:03`	162
	Kym -- There are two main options provided with HSM. One would back up the magnetic disk, so that it could be restored onto a new disk, and the other would make sure all files were on optical, then rebuild the filesystem from optical alone. The former is faster and easier, as long as the magnetic disk is still working. The latter is a standard safeguard against magnetic disk failure, and the customer should be running the "bulk ager" every night to make sure all their files are backed to optical. Since they may be at risk of magnetic failure, the first thing to do is to make sure they are running the bulk ager. This is section 7.1.1 of the HSM Admin Guide. The ager settings (which are in efscfg) that they should use are AGER_BLWM = 0, AGER_MINAGE = 0, AGER_MINSIZE = 0, and AGER_MAXSIZE = something larger than their biggest file. Setting ager parameters is covered in section 6.13. These settings tell HSM to shelve all files to optical (with one exception: HSM does not record empty files on optical). (Note that shelving a file does not remove it from magnetic -- a file can be in both places. It's left on magnetic until HSM is forced to remove it to free up space. And when HSM unshelves a file, i.e. puts a copy back on magnetic, it leaves the copy on optical as well. So they won't lose any performance by running the bulk ager. In fact, it'll increase performance, because the active ager -- the thing that frees up space on magnetic when it's needed -- won't have to actually copy any files to optical if the bulk ager has pre-shelved them.) The fastest way to replace the magnetic disk is to use the HSM backup utility that saves and restores the magnetic part of the efs -- hmdump and hmrestore. (These are parts of the full backup procedure, bkup_efs and restore_efs, but the full backup saves and restores the platters as well, and that's not necessary here.) Running hmdump is covered in section 7.5 and the man page for hmdump, and hmrestore is in section 7.6 and also has a man page. Running hmdump is straightforward except for one thing: They should make sure that no users can access the filesystem during or after the hmdump run, else files might be created or changed, and these changes would be lost. So it is best to umount the filesystem and mount it again read-only (mount option -r) before starting hmdump. Running hmrestore to do a full restore is a bit trickier, because they will be creating a new efs with the same fsid (filesystem id, also called the group id in HSM documentation) as the old one. There can't be two filesystems mounted at the same time with the same fsid, because that's how HSM knows which platters go with which filesystem. So before creating the new efs on the new disk, they must umount the old filesystem. (At this point, the procedure would be a bit different if they had an AdvFS domain with several disks in it, but it doesn't sound like that's the case.) After the old filesystem is unmounted, they should create a new AdvFS domain and fileset on the new disk, then convert the empty filesystem to an efs, using the same fsid as the old filesystem had. See section 2.2 in the Admin Guide. Once the new efs is created and mounted, hmrestore can be run. They want to do a full restore, which would look something like: hmrestore -p /dev/rmt0h /newefs where /dev/rmt0h is the tape device (most likely the same as was used for hmdump) and /newefs is whatever the mountpoint of the efs is. Another caution here: They do not want users writing into the filesystem while the restore is going on. Since they can't mount the filesystem read-only this time, an alternative is to use a different mountpoint from what users expect -- make up something obscure just for this occasion. Then there won't be any accidental writing into the filesystem, either by users or by cron jobs. Even if users are warned not to use the filesystem, they may not know they're using it, if it's embedded in some script. I've had customers who tried several times to do a restore with their filesytem mounted on its usual mountpoint, and finally gave in and used a different mountpoint, so please save your customer some grief and convince them to do this the first time. A further caution: Do not overwrite or discard the original magnetic disk until it's been verified that the filesystem is in good working order after the restore. A very good way to test the filesystem is: While the original filesystem is still mounted (for instance, after the bulk ager run, which I'd still recommend doing asap for safety), get a listing of files with ls -laiR. Repeat the ls -laiR after the restore, and make sure all the files are present. Then run ssgfsck to make sure the pointers from magnetic to optical are in good shape. (Ssgfsck should be run without any options except -p. In particular, they should not use -f.) Both hmdump and hmrestore will run much faster if the bulk ager has recently been run, since they don't need to copy any file that's also on optical. If all files are shelved, then hmdump only has to copy the directories and AdvFS metadata (file headers) to tape. This is an additional incentive to run the bulk ager first. The overall procedure is: -- Run the bulk ager with parameters set so that all files are shelved. -- Make a listing of all files. -- Re-mount the filesystem read-only. -- Run hmdump to copy the magnetic part of the filesystem to tape. -- Umount the filesystem. -- Make a new AdvFS domain and fileset on the new disk. -- Convert it to an efs, with the same fsid as the old filesystem. -- Mount the filesystem on some mountpoint that's not where it's usually mounted. -- Run hmrestore to recreate the magnetic part of the filesystem. -- "Test" the filesystem. -- Use shelvefind or shelvels to find a file that's on optical. (Simplest is to just do shelvels -R on some directory in the efs and stop it when a file with state OPTICAL shows up.) Then read that file by doing cat xxx > /dev/null or sum xxx. Afterward, shelvels xxx should show that the file has state BOTH. -- List all the files and compare with the original filesystem's listing. -- If there's time, run ssgfsck (with no options except -p). This takes a while because it has to mount all the platters one at a time. -- Umount the filesystem. -- Mount it again on its real mountpoint. Please let me know if there are any questions about this. Some answers to specific questions: The AdvFS utilities vmdump and vmrestore won't work (and nor will others like balancing or removing a volume) -- they're not qualified on an efs, which has "tertiary" metadata that tells whether a file has parts on optical. In fact, removing a volume or balancing the filesystem are guaranteed to damage the filesystem if they're done. This is true whether the stand-alone utilties are used, or the AdvFS gui features. Adding a volume is safe -- it's really the only AdvFS utility that is. Although dd might work, hmdump/hmrestore is better -- they do a file-by- file copy, not a physical copy, and won't try to copy anything not in a file. Also, dd will exit the first time it encounters a read error, even if it's given the option that tells it not to. (This has been reported.) And a little proselytizing: It worries me that the customer isn't familiar with the HSM backup utilities. They should be doing regular backup, in addition to running the bulk ager every night. Section 7.3 covers using bkup_efs to do full backups, and section 7.7 covers using hb to do incremental backups. (There is an alternate procedure if they're using NetWorker to do other backups -- are they?) Good luck! -- Pat Tressel
231.2	upgrade; documentation	DECWET::TRESSEL	Pat Tressel	`Tue Mar 04 1997 16:24`	33
	Kym -- Two things not directly related to the problem: > HSM 1.2 Are they really running 1.2, not 1.2a? If so, it would be good for them to upgrade. We may just be able to give them the new version -- do you know whether their license includes upgrades? And I'll ask our product manager if it would be ok, independent of what sort of license they have. > Any pointers etc would be greatly appreciated. You might find it helpful to have a copy of the HSM documentation, which you can get via anonymous ftp from ftp.zso.dec.com, in the directory pub/hsm/documentation/v1.2A. The Admin Guide is NJ-ADM.PS, the release notes are NJA121_RELNOTES.PS or NJA121_RELNOTES.TXT, and HSMREF.PS is a printable copy of the man pages. The man pages are also in their own, separately-installable, subset, so you could install them on your own system without installing HSM. To do this, get the HSM kit, which is v1.2A.tar in the directory pub/hsm. After unpacking it, cd to NJA121, do "setld -l .", and ask for only the man pages (subset name is NJAMAN121) to be installed. I don't believe the installation will get confused and offer to rebuild the kernel, as it would have to were you installing HSM itself, but if it does, just say no... ;-) If you happen to have an optical jukebox languishing in a corner, you could also install HSM. If you want to do this, let me know, and we'll figure out how to get a license. -- Pat
231.3	information from customer	DECWET::TRESSEL	Pat Tressel	`Thu Mar 06 1997 23:28`	50
	Date: 6-MAR-1997 18:32:02.19 From: GIDDAY::SCHWARZ Subj: RE: HSM-4-UNIX note 231: upgrade; documentation To: DECWET::TRESSEL Pat, Sorry to hassle you again. I am wondering if you could help he get this customer off my back. It sounds like he needs someone to give him some consulting and I am happy to tell him this. Anyway here are his comments: So it looks like I should log a hardware call and get a replacement disk on site, so I can proceed. Its a pity so many unix utilities give up on bad blocks, I can understand why dd might do such a thing, but just occasionaly you'd think there could be a swithch that allow such things to be ignored, especially in the case where blocks are not allocated. I guess these blocks are not allocated. To check that theorey I should run ssgfsck after the bulk ager has updated all files on the optical platters. I'm pretty sure that during the day the ager is run with a high and low water mark, but at night the ager in bulk mode to clear everything back to the opticals. You are right about there being one fileset to a magnetic disk, but we do have two 2gb disks. We originally had this test fileset which is allocated one magnetic disk and a number of platters, and the real fileset which is allocated the other magnetic disk and something like 20 of the 36 platters. The test area really isn't used and it probably would be nice to have both disks allocated to the real fileset, allong with all the platter sides except say for 1 or 2 spares. I'm really in the dark as to best way to have this thing function. The software seems fairly smart in that it does load sharing of the information across all available platters. Why it would want to do that I'm not sure, since the greater number of platters being used the greater the overall latency time to read the information from the platters, But since it performs some sort of striping of information across platters (not redundant striping) its probably difficult to see how the information is most easily stored. So the exercise could be increased in size by including within the repair of the disk with bad spots the change of the configuration to have both disks support the real data, and reallocate platters to real dataset. Once again thanks for your help Kym
231.4	they should still use hmdump/hmrestore	DECWET::TRESSEL	Pat Tressel	`Fri Mar 07 1997 00:03`	122
	Kym -- Let's deal with the immediate problem, which is replacing the bad disk, and defer HSM tuning questions. > I guess these blocks are not allocated. If the bad blocks are not in use for a file, then hmdump will not touch them -- it does a file-by-file, not raw, copy. So if the above is true, then hmdump is the right choice for copying out the magnetic part of the filesystem. But...if the bad blocks are not part of a file, then why are they seeing errors? Are the errors happening when the filesystem tries to use those blocks for a file? If so, why aren't the bad blocks being revectored (taken out of service)? I'm confused... Anyhow, if they're right about this, then they should (please!) go ahead and run hmdump, and restore it into a new domain with different disks, as per my previous note. > You are right about there being one fileset to a magnetic disk, but we > do have two 2gb disks. I'm confused again -- what was the context for this? Do they mean that their current domain has only one disk in it, but that they have two disks? Is the other available to make a new domain on? (For the record: A fileset in an AdvFS domain containing multiple disks is not confined to one disk. That's part of the point of having big domains -- filesystems sharing the domain can use whatever space is available.) > So the exercise could be increased in size by including within the > repair of the disk with bad spots the change of the configuration to > have both disks support the real data, and reallocate platters to real > dataset. Does this mean they'd like to have two disks in the new domain? That's certainly possible -- I don't know anything that would stop them from doing the hmrestore into a fileset in a bigger domain. But it sounds as though they'd have to wait until they got another disk, if they only have one spare disk now. (There are some alternatives: They could do the hmdump/hmrestore now onto a single-disk domain, just to get rid of the bad disk. Then they could either addvol a second disk which would allow more space for future files...but they couldn't balance the disks, because the balance feature of AdvFS is not supported for an HSM filesystem. In fact, it will do serious damage to the filesystem. Or, they could get two new disks, make a two-disk domain, and repeat the hmdump/hmrestore to go from the new single-disk domain to the newer two- disk domain. This has the drawback that they'd need to get two new disks.) > Its a pity so many unix utilities give up on > bad blocks, I can understand why dd might do such a thing, but just > occasionaly you'd think there could be a swithch that allow such things > to be ignored, especially in the case where blocks are not allocated. There is a dd option for that purpose, namely: dd conv=noerror but it's broken -- dd quits anyway on the bad block. A qar was reported against dd last year (it's # 45113 in the Digital Unix qar database), and the submitter had fixed the problem even before reporting it, but the fix hasn't yet (that I can see) been incorporated into the released version of dd. I've sent a note to the person who has the fix to see if it would be possible for the customer to use their program. Please don't mention this to the customer until we hear back, because this may not be allowed. If it's not, I might be able to modify the HSM utility copym to work on regular disks instead of HSM-formatted optical platters. And besides, they should try hmdump first. * * * Well, ok, a few comments on HSM operation. > The software seems fairly smart in that it does load sharing of the > information across all available platters. Why it would want to do that > I'm not sure... That's not what it's doing...it's not as smart...or as dumb...as that. When writing a file out to a platter (shelving) it tries to keep all of the file on one platter side, if possible, and it also tries to put files on the same platter where it recorded the file's parent directory info. (It doesn't shelve directories -- it records their names and attributes so that they'll be available if the filesystem has to be rebuilt from the platters alone.) If HSM can't put a file on the same platter as it's parent directory, it picks a platter that has enough free space, that's already part of the filesystem, always checking through the platters in the same order (it doesn't have to mount them to do this), so it fills up one platter before going to the next. The goal is to keep everything on as few platters as possible, and to not split files across platters, that is, to minimize platter motion. But if files are deleted, HSM doesn't move things from one platter to another to compact the space -- that would be very time-consuming, and would degrade performance, since it would tie up drives. So having partly- full platters is tolerated. > But since it performs some sort of striping of information across platters Actually, it's not doing that either. Breaking files up into pieces is a last resort. HSM will only do that if the file won't fit on a platter side, or if it's told to with the "shelve" command. > it probably > would be nice to have both disks allocated to the real fileset, allong > with all the platter sides except say for 1 or 2 spares. The platters don't have to be allocated to the filesystem. When a new platter is put in the jukebox, it's best to set it to owner 5 (which tells HSM that it can use the platter for shelving), group 0 (which means it's not yet part of a specific filesystem), and state internal (meaning it's available for use in general -- it's not about to be taken out of the jukebox, or formatted, or suchlike). -- Pat
231.5	dt, a dd replacement, and then some...	DECWET::TRESSEL	Pat Tressel	`Fri Mar 07 1997 19:32`	81
	Kym -- They should still try hmdump/hmrestore first, but if hmdump fails, then they can copy the disk with "dt". I got a very quick response from Robin Miller about dt (the utility I mentioned that could be used in place of dd). It is ok to use it, and a kit is available for v3.2x systems. Look at Robin's Web page for dt: http://www.zk3.dec.com/~rmiller/dt.html There is a link there that will let you ftp the v3.2 version. The kit includes a manual, but Robin also kindly made up an example showing how to copy a file. I'll append the dt output at the end, but the command from the example is: dt if=/dev/rrz0b of=/dev/rrz3b iomode=copy errors=10 limit=5m This example copies from /dev/rrz0b to /dev/rrz3b, allows 10 errors maximum, and limit is the number of bytes to transfer. In this case, the customer is probably using the whole disk, so they'd want the c partitions, and the limit should be bigger than the size of the whole disk. They probably want to set errors to some big number. They should be very very careful to get the right disks as input and output -- don't want to copy the wrong way!! So their command might look like: dt if=/dev/rrzNc of=/dev/rrzMc iomode=copy errors=500 limit=3g where /dev/rrzNc is the input disk -- the bad one -- and /dev/rrzMc is the output disk -- the new one. (If dt doesn't like 3g, then try 3000m.) -- Pat ------------------------------------------------------------------------------ Part of Robin's example, showing dt output. Note that dt verifies the copy, so there are two passes. % dt if=/dev/rrz0b of=/dev/rrz3b iomode=copy errors=10 limit=5m dt: 'read' - I/O error dt: Relative block number where the error occcured is 428 dt: Error number 1 occurred on Fri Mar 7 10:53:15 1997 Copy Statistics: Data operation performed: Copied '/dev/rrz0b' to '/dev/rrz3b'. Total records processed: 20478 @ 512 bytes/record (0.500 Kbytes) Total bytes transferred: 10484736 (10239.000 Kbytes, 9.999 Mbytes) Average transfer rates: 87677 bytes/sec, 85.622 Kbytes/sec Total passes completed: 0/1 Total errors detected: 1/10 Total elapsed time: 01m59.58s Total system time: 00m06.26s Total user time: 00m00.35s dt: 'read' - I/O error dt: Relative block number where the error occcured is 428 dt: Error number 1 occurred on Fri Mar 7 10:55:11 1997 Verify Statistics: Data operation performed: Verified '/dev/rrz0b' with '/dev/rrz3b'. Total records processed: 20478 @ 512 bytes/record (0.500 Kbytes) Total bytes transferred: 10484736 (10239.000 Kbytes, 9.999 Mbytes) Average transfer rates: 359477 bytes/sec, 351.051 Kbytes/sec Total passes completed: 1/1 Total errors detected: 1/10 Total elapsed time: 00m29.16s Total system time: 00m06.68s Total user time: 00m01.76s Total Statistics: Input device/file name: /dev/rrz0b (Device: RZ28, type=disk) Total records processed: 40956 @ 512 bytes/record (0.500 Kbytes) Total bytes transferred: 20969472 (20478.000 Kbytes, 19.998 Mbytes) Average transfer rates: 140940 bytes/sec, 137.636 Kbytes/sec Total passes completed: 1/1 Total errors detected: 2/10 Total elapsed time: 02m28.78s Total system time: 00m12.96s Total user time: 00m02.13s Starting time: Fri Mar 7 10:53:06 1997 Ending time: Fri Mar 7 10:55:35 1997
231.6	still need some help.	GIDDAY::SANKAR	""	`Wed Mar 26 1997 17:39`	105
	I am following up on this customer's problem in replacing the disk. He did everything in the procedure provided in .-x but did not do the ssgfsck. Everything seem to work ok ,but now he is getting some messages when qqr is run. Can we get some information on what these are what needs to be done. thanks srinivasan sankar csc mail from customer follows ============================ From: SMTP%"DennisMacdonell@auslig.gov.au" 26-MAR-1997 15:56:02.99 : Subj: re: Q33652 Here is a list of the commands that were used to create the efs structure on the new disk - # remove the nfs share for the <efs mount point>, nb the users access the efs # magnetic disk via an nfs link, they do not log onto the jukebox machine directly # the parameters in /usr/efs/efscfg were checked. ager -b 180 <mountpoint> cd <mountpoint> shelvels -R > <log file> mkefs -i <mountpoint> > <logfile> ssgfsck <mountpoint> > <logfile> hmdump -p /dev/rmt0h <mountpoint> # this reported a number of files that it was waiting on to be transferred to the opticals # checking with the shelvels log these files were listed as MAGNETIC even after a # number of attempts to copy them the platters with the bulk ager. While all this activity # was going on the qr and jmd (queuer, jukebox manager) were running as normal. rm <the files that hmdump reported> hmdump -p <mountpoint> # hmdump ran to completion without a hitch rm <link in /sbin/rc3.d for the polycenter startup> shutdown -h now # removed the old magnetic disk and installed the new one boot # checked that the qr and jmd were not running mkfdmn -o <old domain name> mkfset <old domain> <old set name> mount <mountpoint> # uses the entry in /etc/fstab to mount the disk mkefs -f <old efsid> <mountpoint> hmrestore -v -p /dev/rmt0h # reestablished the link in /sbin/rc3.d shutdown -r now cd <mountpoint> # checked that the qr and jmd were running shelvels -R > <logfile> # picked a file that was OPTICAL and did a cat of the file shelvels <filename> # indicated the file was now BOTH #assumed that the whole thing was running. The users were asked to only do read accesses for the next couple of days.Tthe users accessed files for a day and reported no problems. Over night the cron runs the bulk ager and then runs shelvels -R piped to a log file. The day after the upgrade, I tried to clean up some files with strange names like "lo", "cd", "rcp", ",n", etc. I tried to cat these files where they had some bytes, but the command never returned, and I broke out after a couple of hours with a "^C". I ran qqr to find out what the thing was doing and got miles of messages like - File Attribute Update <fs=1, inode=50263, gen=32769, media=3b, refnum=1624137083> File Attribute Update <fs=1, inode=55144, gen=32769, media=31, refnum=1190255091> File Attribute Update <fs=1, inode=78154, gen=32769, media=24, refnum=2142315080> File Attribute Update <fs=1, inode=78252, gen=32769, media=24, refnum=457469444> File Attribute Update <fs=1, inode=78322, gen=32769, media=24, refnum=1518026589> File Attribute Update <fs=1, inode=13151, gen=32774, media=31, refnum=1782329940> File Attribute Update <fs=1, inode=23130, gen=32771, media=1a, refnum=1615004601> File Attribute Update <fs=1, inode=22620, gen=32771, media=1a, refnum=1826004613> File Attribute Update <fs=1, inode=22612, gen=32771, media=1a, refnum=1772160175> File Attribute Update <fs=1, inode=36709, gen=32772, media=1a, refnum=1775890185> File Attribute Update <fs=1, inode=23407, gen=32771, media=1a, refnum=711916695> File Attribute Update <fs=1, inode=23018, gen=32771, media=1a, refnum=1807212515> Question is what is this thing up to Dennis =====================================
231.7	Why were files rm'd?	DECWET::TRESSEL	Pat Tressel	`Mon Apr 07 1997 17:40`	60
	Srinivasan -- > I am just desperate to get some help and I hope I will get some better > visibility by craeting a new note. It won't get any faster response to use a new note, nor hurt to continue using the old one -- we get e-mail when any new note is added. Please use topic 231 to maintain continuity. > hmdump -p /dev/rmt0h <mountpoint> > # this reported a number of files that it was waiting on to be > transferred to the opticals > # checking with the shelvels log these files were listed as MAGNETIC > even after a > # number of attempts to copy them the platters with the bulk ager. The bulk ager decides whether to shelve files or not. Depending on the ager parameters set in the efscfg or .agerconfig, it may not shelve all files. Did you try a shelve command on any of those files? Note that hmdump does not require that MAGNETIC files be shelved. Any MAGNETIC files that are not "dirty" (in transition between magnetic and optical) will be copied to tape by hmdump. > rm <the files that hmdump reported> Why was this done? I sure hope there is some other backup of those files! > # checked that the qr and jmd were not running Why? It is best to leave HSM up, in case it needs to do something, unless there is some specific instruction to the contrary. > File Attribute Update > <fs=1, inode=50263, gen=32769, media=3b, refnum=1624137083> These are normal requests to HSM to update file attributes (e.g. permission, ownership, etc.). They result from a user doing chmod, chown, etc. Are the requests being serviced? Are any requests disappearing from the queue? If the requests in the queue stay the same, then we may need to set the qr log level to 6 for a while, to see what is happening. It is possible they are being serviced from the tail of the queue, which won't be shown if the queue is long enough. Are there any errors reported? Check e-mail to root to see if HSM is reporting difficulty getting a platter. Look for messages with ERROR or WARNING in /usr/efs/qrlog or /usr/efs/jlog. > The day after the upgrade, I tried to clean up some files with > strange names like "lo", "cd", "rcp", ",n", etc. Those look like pieces of commands. I can't guess whether they might be meaningful files or not -- only the customer would know. What are their file attributes? (Do "ls -l" on them.) If they are garbage files, possibly due to the bad spot on the disk, then they'll probably have very off attributes -- they might look like device special files, or have odd permissions or sizes. -- Pat
231.8	Sorry about the delay	DECWET::TRESSEL	Pat Tressel	`Mon Apr 07 1997 19:11`	6
	Srinivasan -- I just found your previous posting in my e-mail inbox -- sorry about the delay in replying! -- Pat
231.9	questions from customer re. accidental rm -rf and stopping HSM from processing the deletes	DECWET::TRESSEL	Pat Tressel	`Thu Apr 10 1997 23:56`	191
	From: GIDDAY::SANKAR "10-Apr-1997 1950 +1000" 10-APR-1997 02:56:09.34 To: DECWET::TRESSEL CC: SANKAR Subj: HSM-4-UNIX note 231 and replies... Pat, Thank you for the mail and replies to note 231 in the HSM-4-UNIX conference. I am mailing this to you thinking that it may be better to do this way. I am sorry to keep bothering you with more and more questions from this customer. For me to gain some experience, I am trying to setup a system with HSM here and I have already managed to get hold of a cd drive (writable) and will start setting it up tomorrow. At present we do not have anyone with much exposure to this product. I can IPMT this case and if it will help you to justify the time you spend on supporting us,please tell me and we can IPMT this case. The customer has sent me two mails which I am attaching to this. He has partially answered some of the questions. (by simply asking more questions) Please see if his replies make sense I sincerely appreciate your help. Thanks sankar ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From: SMTP%"DennisMacdonell@auslig.gov.au" 9-APR-1997 17:17:23.62 To: Sankar <sankar@stl.dec.com> CC: Subj: re: polycenter HSM Hi Sankar, One thing I neglected to mention was that when I was having problems accessing files, because the system was updating attributes, possibly as a result of a chgrp/chmod command, I rebooted the system. Given that there were a number of things that the qr had queued for processing, what happens when a reboot is initiated (a) is the queue of requests lost or are they saved for processing when the machine is rebooted. (b) if an accidental command were issued say a remove command (rm -r ) in the wrong directory, could the machine be closed down to stop the propagation to the opticals. (c) having interrupted the aging of an rm command to the opticals, how would you recover as much information as was still on the opticals when the machine is rebooted. I'm a little unsure as to what happens with queued requests, since the first thing that happens when the system is booted is all the optical disks are catalogued (ie it looks at each platter side and relates it to a physical location in the jukebox, each platter side has a unique label, the platter label is used with commands like the mountm command). This catalogue thing may be just a single request to the jukebox, as it loads the platters in physical location order. During a shutdown/reboot the jukebox remains up, so I assume that the cataloging process is for the purposes of the software running on the host and that software translates accesses to a platter, to a physical location. I've been mindful to shutdown the system when the platters are not being rattled, so generally after the catalogue operation the jukebox is dormant until a user accesses a file. I assume that if requests are queued across a shutdown/reboot, then the information must be stored in a file somewhere, in which case that file would need to be removed and maybe the system rebooted again before it finished cataloging. Any clues. The otherthing with respect to the jukebox is I think it runs slow because of the speed of the optical drives, perhaps they are just single speed devices. Is it possible to upgrade the reader/writer? Dennis +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From: SMTP%"DennisMacdonell@auslig.gov.au" 9-APR-1997 13:11:58.51 To: "sankar@ripper.stl.dec.com" <sankar@ripper.stl.dec.com> CC: Subj: Re: re: Q33652 Hi Sankar, > hmdump -p /dev/rmt0h <mountpoint> > # this reported a number of files that it was waiting on to be > transferred to the opticals > # checking with the shelvels log these files were listed as MAGNETIC > even after a > # number of attempts to copy them the platters with the bulk ager. The bulk ager decides whether to shelve files or not. Depending on the ager parameters set in the efscfg or .agerconfig, it may not shelve all files. Did you try a shelve command on any of those files? >>> ************************************************************** The bulk ager low water mark was set to 0 as indicated by some email I got from Kym Schwartz. So there was a reason in thinking that all files would be aged by the bulk ager. It just so happens that the files that the ager couldn't age and the files that hmdump stalled on happened to be just about the same. ************************************************************* Note that hmdump does not require that MAGNETIC files be shelved. Any MAGNETIC files that are not "dirty" (in transition between magnetic and optical) will be copied to tape by hmdump. > rm <the files that hmdump reported> Why was this done? I sure hope there is some other backup of those files! >>> ************************************************************* hmdump was stalling on certain files. If I hadn't removed the files then we would still be waiting for hmdump to finish. ************************************************************** > # checked that the qr and jmd were not running Why? It is best to leave HSM up, in case it needs to do something, unless there is some specific instruction to the contrary. >>> ************************************************************* I can't see the logic of having qr and jmd running while one is trying to rebuild the magnetic disk, what are qr and jmd supposed to be managing while your trying to run hmrestore to recover the data that should be on the magnetic disk. ************************************************************* > File Attribute Update > <fs=1, inode=50263, gen=32769, media=3b, refnum=1624137083> These are normal requests to HSM to update file attributes (e.g. permission, ownership, etc.). They result from a user doing chmod, chown, etc. >>> ************************************************************* The users were asked not to do any updates to the files, this included chmod, chown. I had issued some chmod/chown/chgrp commands but I'm not sure whether I did that before or after the installation of the new disk. In any case those commands appeared to have completed that is the command had executed at least on the information stored on the magnetic disk. I guess what was happening was that the qr was queueing the changes to the opticals, which was blocking the queueing of file accesses. There seems to be a couple of problems with the software (a) there are no tools to show exactly what is going on, shelvels didn't indicate that there were a whole lot of files to be updated (ie MAGNETIC rather than OPTICAL), qqr only shows the current batch of requests, there may be heaps more to be processed before it gets around to servicing my file access. (b) certain commands should have priority over other commands, ie aging information generated by a chmod/chgrp should not get in the way of a straight access request unless there is some problem with the config file. Now I haven't changed the config file, so I believe it contains the values as set by DEC personnel, I did check to see that the appropriate values were set for the bulk ager, but as it turned out the existing values were fine. *************************************************************** Are the requests being serviced? Are any requests disappearing from the queue? If the requests in the queue stay the same, then we may need to set the qr log level to 6 for a while, to see what is happening. It is possible they are being serviced from the tail of the queue, which won't be shown if the queue is long enough. Are there any errors reported? Check e-mail to root to see if HSM is reporting difficulty getting a platter. Look for messages with ERROR or WARNING in /usr/efs/qrlog or /usr/efs/jlog. >>> customer has not mentioned anything about the email to root and the log file entries. =========================================
231.10	reply to .9	DECWET::TRESSEL	Pat Tressel	`Fri Apr 11 1997 00:00`	224
	From: DECWET::TRESSEL "Pat Tressel" 10-APR-1997 19:52:27.11 To: GIDDAY::SANKAR CC: tressel Subj: RE: HSM-4-UNIX note 231 and replies... Sankar -- > For me to gain some experience, I am trying to setup a system with HSM here Good idea! > I have already managed to get hold of a cd drive (writable) HSM doesn't support CD-R, and it also wants a jukebox. Look at the SPD for a list of supported devices. Basically, it supports the RW5xx series of magneto-optical (MO) jukeboxes. The early ones in this series are obsolete, so you may be able to get one that's been traded in. I've heard that DIAL is the place to look for returned equipment. I haven't used it myself, but my boss is quite adept at finding things... In any case, don't get something expensive. We might be able to help locate something cheap. Best thing is to ask locally and see if anyone's got a jukebox sitting in a corner gathering dust. An alternative is to use a simulated jukebox, using partitions on magnetic disks as fake platters (which is an undocumented "feature" of HSM used only for some specific testing long ago). This isn't very realistic (for one thing, the timing is very different, without platter mounting), but it's useful for trying things out. It isn't useful for learning normal HSM jukebox configuration procedures, because that part is very different. Did you see our Web page (internal to DEC only)? You can get the kit and docs from there: http://slugbt.zso.dec.com/Products.d/hsm.d/hsm.html If you get a real jukebox, you'll also need the CLC kit (changer and optical drivers) -- the CLC Web page is: http://slugbt.zso.dec.com/Products.d/scsicam.d/scsicam.html Let me know if there are any broken links -- our Web server just got switched to a new machine. You'll also need licence paks -- I can send you some for internal use, as long as the system isn't a production system, i.e. if it's just for practice and testing. Otherwise DEC needs to pay a royalty. Note that HSM is not supported beyond v3.2whatever-the-last-one-was. So you'll need a system at v3.2something. A good choice is whatever your customer happens to be running. * * * > One thing I neglected to mention was that when I was having problems > accessing files, because the system was updating attributes, possibly as > a result of a chgrp/chmod command, I rebooted the system. Given that > there were a number of things that the qr had queued for processing, > what happens when a reboot is initiated > (a) is the queue of requests lost or are they saved for processing when > the machine is rebooted. Requests are written to a checkpoint file periodically. How often this is done depends on two parameters in the /usr/efs/efscfg file: If there are Q_CKLEVEL or more requests in the queue, then HSM does checkpointing -- below that, it doesn't bother. While it's checkpointing, it updates the checkpoint file after each Q_INTERVAL requests come in. Defaults are Q_CKLEVEL = 10 and Q_INTERVAL = 3. (This is section 6.6.1 in the HSM admin guide, but there's a typo in that paragraph.) If the system crashes before HSM gets a chance to checkpoint some requests, then those requests will be lost. Most lost requests do not affect normal filesystem operation -- the only HSM operations that affect normal operation are shelving and unshelving, and losing these is not typically a problem. For example, if the system crashes while a user was reading a file (causing it to be unshelved if it was on optical), the user is going to have to re-do their command to read the file anyway, which would issue an unshelve command again (if needed). And anything that causes shelving would also be started again (e.g. running the bulk ager). All other requests are for purposes of recording information that would be useful for rebuilding the filesystem from optical alone (e.g. recording directories and file attributes) -- losing these will cause problems during a rebuild (e.g. files will show up with wrong protections or in the wrong place, or directories might be missing, so files in them won't have any place to go, and will all end up in / with temporary names), which is not good(!!) -- but it won't affect a running filesystem. > (b) if an accidental command were issued say a remove command (rm -r ) > in the wrong directory, could the machine be closed down to stop the > propagation to the opticals. > (c) having interrupted the aging of an rm command to the opticals, how > would you recover as much information as was still on the opticals when > the machine is rebooted. If someone were very* fast, they might be able to stop some of the "truncate" requests from getting through. See below for a way to get (some of) the files back. The point is, never depend on this. HSM is not backup -- a real backup should be done regularly. How is backup being done now? Dare I ask... Did this problem with rm happen? > (c) having interrupted the aging of an rm command to the opticals, how > would you recover as much information as was still on the opticals when > the machine is rebooted. > I assume that if requests are > queued across a shutdown/reboot, then the information must be stored in > a file somewhere, in which case that file would need to be removed and > maybe the system rebooted again before it finished cataloging. Close. A better way to stop HSM from servicing requests is to shut it (not the system) down -- that way one can still do things afterward. Use the shutdown script /usr/efs/PolycenterHSM.shutdown. There are very bad consequences of deleting requests, however -- throwing away requests is the cause of 99.999% of rebuild problems later. There is no way to separate out the unwanted "truncate" requests from any other activity, like creating directories or changing file attributes or mv'ing files -- all requests will be lost. And it's very difficult to actually get rid of requests -- it's not just HSM that's saving them -- AdvFS is holding on to any requests that HSM hasn't asked for yet. I know of no way to tell AdvFS not to send them. So rather than describe any dangerous and uncertain ways of deleting requests, let me mention an alternative: After stopping HSM, umount the filesystem, then rebuild a new, temporary, filesystem from the same optical platters, onto an entirely new disk (or set of disks). Any files that hadn't yet been deleted will show up in this temporary filesystem, and they can be copied out to yet another filesystem. Then umount the temporary filesystem, mount the real one again, *let HSM finish doing the rm's* so that the filesystem is in a consistent state between magnetic and optical, and later, copy the files back in. Two warnings: Do not attempt to mount both the original and rebuilt filesystems at the same time -- they must both have the same "filesystem id" in order to use the same platters, meaning, to HSM, they are the same filesystem. Be very careful to avoid doing anything to the real filesystem's magnetic disks, because the temporary filesystem won't be usable for anything other than getting back some of the rm'd files -- it won't contain any files that hadn't yet been shelved, so it will likely be missing lots of stuff. That's just an outline of the process -- I haven't put in the details. > I'm a little unsure as to what happens with queued requests, since the > first thing that happens when the system is booted is all the optical > disks are catalogued You're right -- HSM will not process requests until the jukebox inventory is done. But I believe HSM does not even start asking AdvFS for requests until after it's ready to process them. So there isn't really a window during which requests could be gotten rid of -- by the time they show up in the HSM queues, HSM is already working on them. > During a shutdown/reboot > the jukebox remains up, so I assume that the cataloging process is for > the purposes of the software running on the host and that software > translates accesses to a platter, to a physical location. HSM keeps a database of what platter is where, and what their states are, so if nothing is done to the jukebox while HSM isn't running, it doesn't strictly need to do the inventory. What it's doing is making sure that the platters are where they're supposed to be, and checking that their filesystems are mountable. (Each platter in use by HSM has a UFS -- Unix file system -- on it.) So mainly, it does the inventory to make sure no-one messed around with the jukebox while it was out of HSM's hands... > I've been mindful to shutdown the system when the platters are not being > rattled The "qqr" command will show if there are any active requests. If the startup and shutdown links for HSM were put in during HSM installation (this is optional), then doing a "shutdown -h" will run all the shutdown commands including HSM shutdown. Once HSM is shut down, there's no worry about losing requests. (Note that just plain "shutdown", "shutdown -r", and "reboot" do not run the shutdown procedures.) > The other thing with respect to the jukebox is I think it runs slow > because of the speed of the optical drives, perhaps they are just single > speed devices. Is it possible to upgrade the reader/writer? The supported drives are the rwz51 (single density) and rwz52 (supports single and double density). A patch would be needed to use rwz53 (single, double, and quad density) drives. I don't know right off if there's any speed difference. This should be in the product description for the drives, which should be available on the web site http://www.digital.com , or I can have a hunt for the product descriptions. Usually drive speed is not a problem. I've done multiple cp -r's of large filesystems into an HSM filesystem, and the ager keeps up, because the cp's are forced to wait. If shelve requests are not being processed quickly, another possibility is that the AdvFS metadata (on magnetic) may be getting fragmented, which would make it take a long time for AdvFS to process responses from HSM, after it has written a file out to optical, to record the optical location in that file's (magnetic) metadata. To find out if this is the case, do (as root): cd /xxx/.tags /usr/sbin/showfile -x M-6 where /xxx is the filesystem's mountpoint. This shows how big all the extents are in the magnetic metadata on the first volume in the domain. If there are more volumes, then repeat the showfile command on their metadata, which will be M-n where n is 6 times the volume number, e.g. for volume 3, do: /usr/sbin/showfile -x M-18 If there are several hundred extent, or the extents are very little (e.g. if some have sizes smaller than, say, 32), then the metadata is badly fragmented. The procedure for "fixing" this involves creating a new domain on another (set of) disk(s) with larger parameters to the mkfdmn and addvol commands to control the metadata extent size and initial allocation, and backing up the original magnetic part of the filesystem using hmdump, then restoring it into the new domain with hmrestore. As with rebuilding, the old and new filesystems can't be mounted at the same time. Again, this is only a sketch of the process. -- Pat