[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference decwet::hsm-4-unix

Title:HSM for UNIX Platforms
Notice:Kit Info in note 2.1 -- Public Info Pointer in 3.1
Moderator:DECWET::TRESSEL
Created:Fri Jul 08 1994
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:238
Total number of notes:998

231.0. "backup HSM cache ?" by GIDDAY::SCHWARZ () Tue Mar 04 1997 00:00

    G'day,
    
    Please excuse me if this is a really simple question.
    
    I have a custome who is running D.unix 3.2g and HSM 1.2. They have a
    cache of files on an rz28 set up as an efs ( entended file system).
    This magnetic disk is reporting bad blocks and we wish to replace it.
    What can we use to back up the data before we replace the disk? Will
    standard vdump work?  The customer is not sure if this will work and is
    suggesting using dd .
    
    Any pointers etc would be greatly appreciated.
    
    Regards
    
    Kym Schwarz
    Unix Support
    CSC Sydney
T.RTitleUserPersonal
Name
DateLines
231.1use hmdump / hmrestoreDECWET::TRESSELPat TresselTue Mar 04 1997 16:03162
Kym --

There are two main options provided with HSM.  One would back up the
magnetic disk, so that it could be restored onto a new disk, and the
other would make sure all files were on optical, then rebuild the
filesystem from optical alone.  The former is faster and easier, as
long as the magnetic disk is still working.  The latter is a standard
safeguard against magnetic disk failure, and the customer should be
running the "bulk ager" every night to make sure all their files are
backed to optical.

Since they may be at risk of magnetic failure, the first thing to do
is to make sure they *are* running the bulk ager.  This is section
7.1.1 of the HSM Admin Guide.  The ager settings (which are in efscfg)
that they should use are AGER_BLWM = 0, AGER_MINAGE = 0, AGER_MINSIZE = 0,
and AGER_MAXSIZE = something larger than their biggest file.  Setting
ager parameters is covered in section 6.13.  These settings tell HSM to
shelve all files to optical (with one exception: HSM does not record
empty files on optical).  (Note that shelving a file does *not* remove
it from magnetic -- a file can be in both places.  It's left on magnetic
until HSM is forced to remove it to free up space.  And when HSM
unshelves a file, i.e. puts a copy back on magnetic, it leaves the copy
on optical as well.  So they won't lose any performance by running the
bulk ager.  In fact, it'll increase performance, because the active
ager -- the thing that frees up space on magnetic when it's needed --
won't have to actually copy any files to optical if the bulk ager has
pre-shelved them.)

The fastest way to replace the magnetic disk is to use the HSM backup
utility that saves and restores the magnetic part of the efs -- hmdump
and hmrestore.  (These are parts of the full backup procedure, bkup_efs
and restore_efs, but the full backup saves and restores the platters
as well, and that's not necessary here.)  Running hmdump is covered in
section 7.5 and the man page for hmdump, and hmrestore is in section 7.6
and also has a man page.

Running hmdump is straightforward except for one thing:  They should make
sure that no users can access the filesystem during or after the hmdump
run, else files might be created or changed, and these changes would be
lost.  So it is best to umount the filesystem and mount it again read-only
(mount option -r) before starting hmdump.

Running hmrestore to do a full restore is a bit trickier, because they
will be creating a new efs with the same fsid (filesystem id, also called
the group id in HSM documentation) as the old one.  There can't be two
filesystems mounted at the same time with the same fsid, because that's
how HSM knows which platters go with which filesystem.  So before creating
the new efs on the new disk, they must umount the old filesystem.

(At this point, the procedure would be a bit different if they had an AdvFS
domain with several disks in it, but it doesn't sound like that's the case.)

After the old filesystem is unmounted, they should create a new AdvFS
domain and fileset on the new disk, then convert the empty filesystem
to an efs, using the same fsid as the old filesystem had.  See section
2.2 in the Admin Guide.

Once the new efs is created and mounted, hmrestore can be run.  They want
to do a full restore, which would look something like:

  hmrestore -p /dev/rmt0h /newefs

where /dev/rmt0h is the tape device (most likely the same as was used for
hmdump) and /newefs is whatever the mountpoint of the efs is.

Another caution here:  They do *not* want users writing into the filesystem
while the restore is going on.  Since they can't mount the filesystem
read-only this time, an alternative is to *use a different mountpoint*
from what users expect -- make up something obscure just for this occasion. 
Then there won't be any accidental writing into the filesystem, either by
users or by cron jobs.  Even if users are warned not to use the filesystem,
they may not know they're using it, if it's embedded in some script.  I've
had customers who tried several times to do a restore with their filesytem
mounted on its usual mountpoint, and finally gave in and used a different
mountpoint, so please save your customer some grief and convince them to
do this the first time.

A further caution:  Do *not* overwrite or discard the original magnetic
disk until it's been verified that the filesystem is in good working order
after the restore.  A very good way to test the filesystem is:  While the
*original* filesystem is still mounted (for instance, after the bulk ager
run, which I'd still recommend doing asap for safety), get a listing of
files with ls -laiR.  Repeat the ls -laiR after the restore, and make sure
all the files are present.  Then run ssgfsck to make sure the pointers from
magnetic to optical are in good shape.  (Ssgfsck should be run without any
options except -p.  In particular, they should *not* use -f.)

Both hmdump and hmrestore will run much faster if the bulk ager has
recently been run, since they don't need to copy any file that's also
on optical.  If all files are shelved, then hmdump only has to copy the
directories and AdvFS metadata (file headers) to tape.  This is an
additional incentive to run the bulk ager first.

The overall procedure is:

  -- Run the bulk ager with parameters set so that all files are shelved.

  -- Make a listing of all files.

  -- Re-mount the filesystem read-only.

  -- Run hmdump to copy the magnetic part of the filesystem to tape.

  -- Umount the filesystem.

  -- Make a new AdvFS domain and fileset on the new disk.

  -- Convert it to an efs, with the same fsid as the old filesystem.

  -- Mount the filesystem on some mountpoint that's not where it's usually
     mounted.

  -- Run hmrestore to recreate the magnetic part of the filesystem.

  -- "Test" the filesystem.

     -- Use shelvefind or shelvels to find a file that's on optical.
        (Simplest is to just do shelvels -R on some directory in the efs
        and stop it when a file with state OPTICAL shows up.)  Then read
        that file by doing cat xxx > /dev/null or sum xxx.  Afterward,
        shelvels xxx should show that the file has state BOTH.

     -- List all the files and compare with the original filesystem's
        listing.

     -- If there's time, run ssgfsck (with no options except -p).  This
        takes a while because it has to mount all the platters one at a
        time.

  -- Umount the filesystem.

  -- Mount it again on its real mountpoint.

Please let me know if there are any questions about this.

Some answers to specific questions:

The AdvFS utilities vmdump and vmrestore won't work (and nor will others
like balancing or removing a volume) -- they're not qualified on an efs,
which has "tertiary" metadata that tells whether a file has parts on
optical.  In fact, removing a volume or balancing the filesystem are
guaranteed to damage the filesystem if they're done.  This is true whether
the stand-alone utilties are used, or the AdvFS gui features.  Adding a
volume is safe -- it's really the only AdvFS utility that is.

Although dd *might* work, hmdump/hmrestore is better -- they do a file-by-
file copy, not a physical copy, and won't try to copy anything not in a
file.  Also, dd will exit the first time it encounters a read error, even
if it's given the option that tells it not to.  (This has been reported.)

And a little proselytizing:

It worries me that the customer isn't familiar with the HSM backup utilities.
They should be doing regular backup, in addition to running the bulk ager
every night.  Section 7.3 covers using bkup_efs to do full backups, and
section 7.7 covers using hb to do incremental backups.  (There is an
alternate procedure if they're using NetWorker to do other backups -- are
they?)

Good luck!

-- Pat Tressel
231.2upgrade; documentationDECWET::TRESSELPat TresselTue Mar 04 1997 16:2433
Kym --

Two things not directly related to the problem:

> HSM 1.2

Are they really running 1.2, not 1.2a?  If so, it would be good for them
to upgrade.  We may just be able to give them the new version -- do you
know whether their license includes upgrades?  And I'll ask our product
manager if it would be ok, independent of what sort of license they have.

> Any pointers etc would be greatly appreciated.

You might find it helpful to have a copy of the HSM documentation, which
you can get via anonymous ftp from ftp.zso.dec.com, in the directory
pub/hsm/documentation/v1.2A.  The Admin Guide is NJ-ADM.PS, the release
notes are NJA121_RELNOTES.PS or NJA121_RELNOTES.TXT, and HSMREF.PS is
a printable copy of the man pages.

The man pages are also in their own, separately-installable, subset, so
you could install them on your own system without installing HSM.  To do
this, get the HSM kit, which is v1.2A.tar in the directory pub/hsm.
After unpacking it, cd to NJA121, do "setld -l .", and ask for only the
man pages (subset name is NJAMAN121) to be installed.  I don't believe
the installation will get confused and offer to rebuild the kernel, as
it would have to were you installing HSM itself, but if it does, just
say no...  ;-)

If you happen to have an optical jukebox languishing in a corner, you
could also install HSM.  If you want to do this, let me know, and we'll
figure out how to get a license.

-- Pat
231.3information from customerDECWET::TRESSELPat TresselThu Mar 06 1997 23:2850
Date:	 6-MAR-1997 18:32:02.19
From:	GIDDAY::SCHWARZ     
Subj:	RE: HSM-4-UNIX note 231: upgrade; documentation
To:	DECWET::TRESSEL
Pat,

Sorry to hassle you again. I am wondering if you could help he get this customer
off my back.  It sounds like he needs someone to give him some consulting and
I am happy to tell him this.

Anyway here are his comments:

So it looks like I should log a hardware call and get a replacement disk
on site, so I can proceed. Its a pity so many unix utilities give up on
bad blocks, I can understand why dd might do such a thing, but just
occasionaly  you'd think there could be a swithch that allow such things
to be ignored, especially in the case where blocks are not allocated. I
guess these blocks are not allocated. To check that theorey I should run
ssgfsck after the bulk ager has updated all files on the optical
platters. I'm pretty sure that during the day the ager is run with a
high and low water mark, but at night the ager in bulk mode to clear
everything back to the opticals.

You are right about there being one fileset to a magnetic disk, but we
do have two 2gb disks. We originally had this test fileset which is
allocated one magnetic disk and a number of platters, and the real
fileset which is allocated the other magnetic disk and something like 20
of the 36 platters. The test area really isn't used and it probably
would be nice to have both disks allocated to the real fileset, allong
with all the platter sides except say for 1 or 2 spares.

I'm really in the dark as to best way to have this thing function. The
software seems fairly smart in that it does load sharing of the
information across all available platters. Why it would want to do that
I'm not sure, since the greater number of platters being used the
greater the overall latency time to read the information from the
platters, But since it performs some sort of striping of information
across platters (not redundant striping) its probably difficult to see
how the information is most easily stored.

So the exercise could be increased in size by including within the
repair of the disk with bad spots the change of the configuration to
have both disks support the real data, and reallocate platters to real
dataset.



Once again thanks for your help

Kym
231.4they should still use hmdump/hmrestoreDECWET::TRESSELPat TresselFri Mar 07 1997 00:03122
Kym --

Let's deal with the immediate problem, which is replacing the bad disk,
and defer HSM tuning questions.

> I guess these blocks are not allocated.

If the bad blocks are not in use for a file, then hmdump will not touch
them -- it does a file-by-file, not raw, copy.  So if the above is true,
then hmdump is the right choice for copying out the magnetic part of the
filesystem.

But...if the bad blocks are not part of a file, then why are they seeing
errors?  Are the errors happening when the filesystem tries to use those
blocks for a file?  If so, why aren't the bad blocks being revectored
(taken out of service)?  I'm confused...

Anyhow, *if* they're right about this, then they should (please!) go ahead
and run hmdump, and restore it into a new domain with different disks, as
per my previous note.

> You are right about there being one fileset to a magnetic disk, but we
> do have two 2gb disks.

I'm confused again -- what was the context for this?  Do they mean that
their current domain has only one disk in it, but that they have two
disks?  Is the other available to make a new domain on?

(For the record:  A fileset in an AdvFS domain containing multiple disks
is not confined to one disk.  That's part of the point of having big
domains -- filesystems sharing the domain can use whatever space is
available.)

> So the exercise could be increased in size by including within the
> repair of the disk with bad spots the change of the configuration to
> have both disks support the real data, and reallocate platters to real
> dataset.

Does this mean they'd like to have two disks in the new domain?  That's
certainly possible -- I don't know anything that would stop them from
doing the hmrestore into a fileset in a bigger domain.  But it sounds
as though they'd have to wait until they got another disk, if they only
have one spare disk now.

(There are some alternatives:
They could do the hmdump/hmrestore now onto a single-disk domain, just to
get rid of the bad disk.  Then they could either addvol a second disk
which would allow more space for future files...but they couldn't balance
the disks, because the balance feature of AdvFS is not supported for an
HSM filesystem.  In fact, it will do serious damage to the filesystem.
Or, they could get two new disks, make a two-disk domain, and repeat the
hmdump/hmrestore to go from the new single-disk domain to the newer two-
disk domain.  This has the drawback that they'd need to get two new disks.)

> Its a pity so many unix utilities give up on
> bad blocks, I can understand why dd might do such a thing, but just
> occasionaly  you'd think there could be a swithch that allow such things
> to be ignored, especially in the case where blocks are not allocated.

There *is* a dd option for that purpose, namely:

  dd conv=noerror

but it's broken -- dd quits anyway on the bad block.  A qar was reported
against dd last year (it's # 45113 in the Digital Unix qar database), and
the submitter had fixed the problem even before reporting it, but the fix
hasn't yet (that I can see) been incorporated into the released version
of dd.  I've sent a note to the person who has the fix to see if it would
be possible for the customer to use their program.  Please don't mention
this to the customer until we hear back, because this may not be allowed.
If it's not, I might be able to modify the HSM utility copym to work on
regular disks instead of HSM-formatted optical platters.

And besides, they should try hmdump first.

                                 * * *

Well, ok, a few comments on HSM operation.

> The software seems fairly smart in that it does load sharing of the
> information across all available platters. Why it would want to do that
> I'm not sure...

That's not what it's doing...it's not as smart...or as dumb...as that.
When writing a file out to a platter (shelving) it tries to keep all of
the file on one platter side, if possible, and it also tries to put files
on the same platter where it recorded the file's parent directory info.
(It doesn't shelve directories -- it records their names and attributes
so that they'll be available if the filesystem has to be rebuilt from
the platters alone.)  If HSM can't put a file on the same platter as it's
parent directory, it picks a platter that has enough free space, that's
already part of the filesystem, always checking through the platters in
the same order (it doesn't have to mount them to do this), so it fills
up one platter before going to the next.

The goal is to keep everything on as few platters as possible, and to not
split files across platters, that is, to minimize platter motion.

But if files are deleted, HSM doesn't move things from one platter to
another to compact the space -- that would be very time-consuming, and
would degrade performance, since it would tie up drives.  So having partly-
full platters is tolerated.

> But since it performs some sort of striping of information across platters

Actually, it's not doing that either.  Breaking files up into pieces is a
last resort.  HSM will only do that if the file won't fit on a platter side,
or if it's told to with the "shelve" command.

> it probably
> would be nice to have both disks allocated to the real fileset, allong
> with all the platter sides except say for 1 or 2 spares.

The platters don't have to be allocated to the filesystem.  When a new
platter is put in the jukebox, it's best to set it to owner 5 (which tells
HSM that it can use the platter for shelving), group 0 (which means it's
not yet part of a specific filesystem), and state internal (meaning it's
available for use in general -- it's not about to be taken out of the
jukebox, or formatted, or suchlike).

-- Pat

231.5dt, a dd replacement, and then some...DECWET::TRESSELPat TresselFri Mar 07 1997 19:3281
Kym --

They should still try hmdump/hmrestore first, but if hmdump fails, then
they can copy the disk with "dt".

I got a very quick response from Robin Miller about dt (the utility I
mentioned that could be used in place of dd).  It *is* ok to use it, and a
kit is available for v3.2x systems.  Look at Robin's Web page for dt:

  http://www.zk3.dec.com/~rmiller/dt.html

There is a link there that will let you ftp the v3.2 version.  The kit
includes a manual, but Robin also kindly made up an example showing how to
copy a file.  I'll append the dt output at the end, but the command from
the example is:

  dt if=/dev/rrz0b of=/dev/rrz3b iomode=copy errors=10 limit=5m

This example copies from /dev/rrz0b to /dev/rrz3b, allows 10 errors
maximum, and limit is the number of bytes to transfer.  In this case, the
customer is probably using the whole disk, so they'd want the c partitions,
and the limit should be bigger than the size of the whole disk.  They
probably want to set errors to some big number.  They should be very very
careful to get the right disks as input and output -- don't want to copy
the wrong way!!

So their command might look like:

  dt if=/dev/rrzNc of=/dev/rrzMc iomode=copy errors=500 limit=3g

where /dev/rrzNc is the input disk -- the bad one -- and /dev/rrzMc is
the output disk -- the new one.  (If dt doesn't like 3g, then try 3000m.)

-- Pat

------------------------------------------------------------------------------

Part of Robin's example, showing dt output.  Note that dt verifies the
copy, so there are two passes.

% dt if=/dev/rrz0b of=/dev/rrz3b iomode=copy errors=10 limit=5m
dt: 'read' - I/O error
dt: Relative block number where the error occcured is 428
dt: Error number 1 occurred on Fri Mar  7 10:53:15 1997
Copy Statistics:
    Data operation performed: Copied '/dev/rrz0b' to '/dev/rrz3b'.
     Total records processed: 20478 @ 512 bytes/record (0.500 Kbytes)
     Total bytes transferred: 10484736 (10239.000 Kbytes, 9.999 Mbytes)
      Average transfer rates: 87677 bytes/sec, 85.622 Kbytes/sec
      Total passes completed: 0/1
       Total errors detected: 1/10
          Total elapsed time: 01m59.58s
           Total system time: 00m06.26s
             Total user time: 00m00.35s

dt: 'read' - I/O error
dt: Relative block number where the error occcured is 428
dt: Error number 1 occurred on Fri Mar  7 10:55:11 1997
Verify Statistics:
    Data operation performed: Verified '/dev/rrz0b' with '/dev/rrz3b'.
     Total records processed: 20478 @ 512 bytes/record (0.500 Kbytes)
     Total bytes transferred: 10484736 (10239.000 Kbytes, 9.999 Mbytes)
      Average transfer rates: 359477 bytes/sec, 351.051 Kbytes/sec
      Total passes completed: 1/1
       Total errors detected: 1/10
          Total elapsed time: 00m29.16s
           Total system time: 00m06.68s
             Total user time: 00m01.76s

Total Statistics:
      Input device/file name: /dev/rrz0b (Device: RZ28, type=disk)
     Total records processed: 40956 @ 512 bytes/record (0.500 Kbytes)
     Total bytes transferred: 20969472 (20478.000 Kbytes, 19.998 Mbytes)
      Average transfer rates: 140940 bytes/sec, 137.636 Kbytes/sec
      Total passes completed: 1/1
       Total errors detected: 2/10
          Total elapsed time: 02m28.78s
           Total system time: 00m12.96s
             Total user time: 00m02.13s
               Starting time: Fri Mar  7 10:53:06 1997
                 Ending time: Fri Mar  7 10:55:35 1997
231.6still need some help.GIDDAY::SANKAR""Wed Mar 26 1997 17:39105

I am following up on this customer's problem in replacing the disk.

He did everything in the procedure provided in .-x  but did not do the
ssgfsck.

Everything seem to work ok ,but now he is getting some messages when qqr is run.

Can we get some information on what these are what needs to be done.


thanks
srinivasan sankar
csc

mail from customer follows
============================
From:	SMTP%"DennisMacdonell@auslig.gov.au" 26-MAR-1997 15:56:02.99
:	
Subj:	re: Q33652


Here is a list of the commands that were used to create the efs
structure on the new disk -

# remove the nfs share for the <efs mount point>, nb the users access
the efs
# magnetic disk via an nfs link, they do not log onto the jukebox
machine directly
# the parameters in /usr/efs/efscfg were checked.
ager -b 180 <mountpoint>
cd <mountpoint>
shelvels -R > <log file>
mkefs -i <mountpoint> > <logfile>
ssgfsck <mountpoint> > <logfile>
hmdump -p /dev/rmt0h <mountpoint>
# this reported a number of files that it was waiting on to be
transferred to the opticals
# checking with the shelvels log these files were listed as MAGNETIC
even after a
# number of attempts to copy them the platters with the bulk ager. While
all this activity
# was going on the qr and jmd (queuer, jukebox manager) were running as
normal.
rm <the files that hmdump reported>
hmdump -p <mountpoint>
# hmdump ran to completion without a hitch
rm <link in /sbin/rc3.d for the polycenter startup>
shutdown -h now
# removed the old magnetic disk and installed the new one
boot
# checked that the qr and jmd were not running
mkfdmn -o <old domain name>
mkfset <old domain> <old set name>
mount <mountpoint>    # uses the entry in /etc/fstab to mount the disk
mkefs -f <old efsid> <mountpoint>
hmrestore -v -p /dev/rmt0h
# reestablished the link in /sbin/rc3.d
shutdown -r now
cd <mountpoint>
# checked that the qr and jmd were running
shelvels -R > <logfile>
# picked a file that was OPTICAL and did a cat of the file
shelvels <filename> # indicated the file was now BOTH
#assumed that the whole thing was running.

The users were asked to only do read accesses for the next couple of
days.Tthe users accessed files for a day and reported no problems. Over
night the cron runs the bulk ager and then runs shelvels -R piped to a
log file. The day after the upgrade, I tried to clean up some files with
strange names like "lo", "cd", "rcp", ",n", etc. I tried to cat these
files where they had some bytes, but the command never returned, and I
broke out after a couple of hours with a "^C". I ran qqr to find out
what the thing was doing and got miles of messages like -

File Attribute Update
        <fs=1, inode=50263, gen=32769, media=3b, refnum=1624137083>
File Attribute Update
        <fs=1, inode=55144, gen=32769, media=31, refnum=1190255091>
File Attribute Update
        <fs=1, inode=78154, gen=32769, media=24, refnum=2142315080>
File Attribute Update
        <fs=1, inode=78252, gen=32769, media=24, refnum=457469444>
File Attribute Update
        <fs=1, inode=78322, gen=32769, media=24, refnum=1518026589>
File Attribute Update
        <fs=1, inode=13151, gen=32774, media=31, refnum=1782329940>
File Attribute Update
        <fs=1, inode=23130, gen=32771, media=1a, refnum=1615004601>
File Attribute Update
        <fs=1, inode=22620, gen=32771, media=1a, refnum=1826004613>
File Attribute Update
        <fs=1, inode=22612, gen=32771, media=1a, refnum=1772160175>
File Attribute Update
        <fs=1, inode=36709, gen=32772, media=1a, refnum=1775890185>
File Attribute Update
        <fs=1, inode=23407, gen=32771, media=1a, refnum=711916695>
File Attribute Update
        <fs=1, inode=23018, gen=32771, media=1a, refnum=1807212515>

Question is what is this thing up to

Dennis
=====================================
231.7Why were files rm'd?DECWET::TRESSELPat TresselMon Apr 07 1997 17:4060
Srinivasan --

> I am just desperate to get some help and I hope I will get some better
> visibility by craeting a new note.

It won't get any faster response to use a new note, nor hurt to continue
using the old one -- we get e-mail when any new note is added.  Please
use topic 231 to maintain continuity.

> hmdump -p /dev/rmt0h <mountpoint>
> # this reported a number of files that it was waiting on to be
> transferred to the opticals
> # checking with the shelvels log these files were listed as MAGNETIC
> even after a
> # number of attempts to copy them the platters with the bulk ager.

The bulk ager decides whether to shelve files or not.  Depending on the
ager parameters set in the efscfg or .agerconfig, it may not shelve all
files.  Did you try a shelve command on any of those files?

Note that hmdump does not require that MAGNETIC files be shelved.  Any
MAGNETIC files that are not "dirty" (in transition between magnetic and
optical) will be copied to tape by hmdump.

> rm <the files that hmdump reported>

Why was this done?  I sure hope there is some other backup of those files!

> # checked that the qr and jmd were not running

Why?  It is best to leave HSM up, in case it needs to do something, unless
there is some specific instruction to the contrary.

> File Attribute Update
>         <fs=1, inode=50263, gen=32769, media=3b, refnum=1624137083>

These are normal requests to HSM to update file attributes (e.g. permission,
ownership, etc.).  They result from a user doing chmod, chown, etc.

Are the requests being serviced?  Are any requests disappearing from the
queue?  If the requests in the queue stay the same, then we may need to
set the qr log level to 6 for a while, to see what is happening.  It is
possible they are being serviced from the tail of the queue, which won't
be shown if the queue is long enough.

Are there any errors reported?  Check e-mail to root to see if HSM is
reporting difficulty getting a platter.  Look for messages with ERROR or
WARNING in /usr/efs/qrlog or /usr/efs/jlog.

> The day after the upgrade, I tried to clean up some files with
> strange names like "lo", "cd", "rcp", ",n", etc.

Those look like pieces of commands.  I can't guess whether they might be
meaningful files or not -- only the customer would know.  What are their
file attributes?  (Do "ls -l" on them.)  If they are garbage files,
possibly due to the bad spot on the disk, then they'll probably have very
off attributes -- they might look like device special files, or have odd
permissions or sizes.

-- Pat
231.8Sorry about the delayDECWET::TRESSELPat TresselMon Apr 07 1997 19:116
Srinivasan --

I just found your previous posting in my e-mail inbox -- sorry about the
delay in replying!

-- Pat
231.9questions from customer re. accidental rm -rf and stopping HSM from processing the deletesDECWET::TRESSELPat TresselThu Apr 10 1997 23:56191
From:	GIDDAY::SANKAR "10-Apr-1997 1950 +1000" 10-APR-1997 02:56:09.34
To:	DECWET::TRESSEL
CC:	SANKAR
Subj:	HSM-4-UNIX note 231 and replies...



Pat,

Thank you for the mail and replies to note 231 in the HSM-4-UNIX conference.
I am mailing this to you thinking that it may be better to do this way.

I am sorry to keep bothering you with more and more questions from this customer.
For me to gain some experience, I am trying to setup a system with HSM here and 
I have already managed to get hold of a cd drive (writable) and will start 
setting it up tomorrow. At present we do not have anyone with much exposure 
to this product.

I can IPMT this case and if it will help you to justify the time you spend on
supporting us,please tell me and we can IPMT this case.

The customer has sent me two mails which I am attaching to this.
He has partially answered some of the questions.
(by simply asking more questions)

Please see if his replies make sense

I sincerely appreciate your help.

Thanks
sankar  




++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
From:	SMTP%"DennisMacdonell@auslig.gov.au"  9-APR-1997 17:17:23.62
To:	Sankar <sankar@stl.dec.com>
CC:	
Subj:	re: polycenter HSM



Hi Sankar,

One thing I neglected to mention was that when I was having problems
accessing files, because the system was updating attributes, possibly as
a result of a chgrp/chmod command, I rebooted the system. Given that
there were a number of things that the qr had queued for processing,
what happens when a reboot is initiated
(a) is the queue of requests lost or are they saved for processing when
the machine is rebooted.
(b) if an accidental command were issued say a remove command (rm -r *)
in the wrong directory, could the machine be closed down to stop the
propagation to the opticals.
(c) having interrupted the aging of an rm command to the opticals, how
would you recover as much information as was still on the opticals when
the machine is rebooted.

I'm a little unsure as to what happens with queued requests, since the
first thing that happens when the system is booted is all the optical
disks are catalogued (ie it looks at each platter side and relates it to
a physical location in the jukebox, each platter side has a unique
label, the platter label is used with commands like the mountm command).
This catalogue thing may be just a single request to the jukebox, as it
loads the platters in physical location order. During a shutdown/reboot
the jukebox remains up, so I assume that the cataloging process is for
the purposes of the software running on the host and that software
translates accesses to a platter, to a physical location.

I've been mindful to shutdown the system when the platters are not being
rattled, so generally after the catalogue operation the jukebox is
dormant until a user accesses a file. I assume that if requests are
queued across a shutdown/reboot, then the information must be stored in
a file somewhere, in which case that file would need to be removed and
maybe the system rebooted again before it finished cataloging.


Any clues.


The otherthing with respect to the jukebox is I think it runs slow
because of the speed of the optical drives, perhaps they are just single
speed devices. Is it possible to upgrade the reader/writer?
Dennis

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

From:	SMTP%"DennisMacdonell@auslig.gov.au"  9-APR-1997 13:11:58.51
To:	"sankar@ripper.stl.dec.com" <sankar@ripper.stl.dec.com>
CC:	
Subj:	Re: re: Q33652


Hi Sankar,

> hmdump -p /dev/rmt0h <mountpoint>
> # this reported a number of files that it was waiting on to be
> transferred to the opticals
> # checking with the shelvels log these files were listed as MAGNETIC
> even after a
> # number of attempts to copy them the platters with the bulk ager.

The bulk ager decides whether to shelve files or not.  Depending on the
ager parameters set in the efscfg or .agerconfig, it may not shelve all
files.  Did you try a shelve command on any of those files?

>>>
*****************************************************************
 The bulk ager low water mark was set to 0 as indicated by some email I
got from Kym Schwartz. So there was a reason in thinking that all files
would be aged by the bulk ager. It just so happens that the files that
the ager couldn't age and the files that hmdump stalled on happened to
be just about the same.
*****************************************************************

Note that hmdump does not require that MAGNETIC files be shelved.  Any
MAGNETIC files that are not "dirty" (in transition between magnetic and
optical) will be copied to tape by hmdump.

> rm <the files that hmdump reported>

Why was this done?  I sure hope there is some other backup of those
files!

>>>
*****************************************************************
hmdump was stalling on certain files. If I hadn't removed the files 
then we would still be waiting for hmdump to finish.
******************************************************************

> # checked that the qr and jmd were not running

Why?  It is best to leave HSM up, in case it needs to do something,
unless there is some specific instruction to the contrary.

>>>
*****************************************************************
I can't see the logic of having qr and jmd running while one is trying
to rebuild the magnetic disk, what are qr and jmd supposed to be
managing while your trying to run hmrestore to recover the data that
should be on the magnetic disk.
*****************************************************************

> File Attribute Update
>         <fs=1, inode=50263, gen=32769, media=3b, refnum=1624137083>

These are normal requests to HSM to update file attributes (e.g.
permission,
ownership, etc.).  They result from a user doing chmod, chown, etc.

>>>
*****************************************************************
 The users were asked not to do any updates to the files, this included
chmod, chown. I had issued some chmod/chown/chgrp commands but I'm not
sure whether I did that before or after the installation of the new
disk. In any case those commands appeared to have completed that is the
command had executed at least on the information stored on the magnetic
disk. I guess what was happening was that the qr was queueing the
changes to the opticals, which was blocking the queueing of file
accesses.  There seems to be a couple of problems with the software
(a) there are no tools to show exactly what is going on, shelvels didn't
indicate that there were a whole lot of files to be updated (ie MAGNETIC
rather than OPTICAL), qqr only shows the current batch of requests,
there may be heaps more to be processed before it gets around to
servicing my file access.
(b) certain commands should have priority over other commands, ie aging
information generated by a chmod/chgrp should not get in the way of a
straight access request unless there is some problem with the config
file. Now I haven't changed the config file, so I believe it contains
the values as set by DEC personnel, I did check to see that the
appropriate values were set for the bulk ager, but as it turned out the
existing values were fine.
*****************************************************************

Are the requests being serviced?  Are any requests disappearing from the
queue?  If the requests in the queue stay the same, then we may need to
set the qr log level to 6 for a while, to see what is happening.  It is
possible they are being serviced from the tail of the queue, which won't
be shown if the queue is long enough.

Are there any errors reported?  Check e-mail to root to see if HSM is
reporting difficulty getting a platter.  Look for messages with ERROR or
WARNING in /usr/efs/qrlog or /usr/efs/jlog.
>>>
customer has not mentioned anything about the email to root and the log file 
entries.
=========================================
 
231.10reply to .9DECWET::TRESSELPat TresselFri Apr 11 1997 00:00224
From:	DECWET::TRESSEL      "Pat Tressel" 10-APR-1997 19:52:27.11
To:	GIDDAY::SANKAR
CC:	tressel
Subj:	RE: HSM-4-UNIX note 231 and replies...

Sankar --

> For me to gain some experience, I am trying to setup a system with HSM here

Good idea!

> I have already managed to get hold of a cd drive (writable)

HSM doesn't support CD-R, and it also wants a jukebox.  Look at the SPD for a
list of supported devices.  Basically, it supports the RW5xx series of
magneto-optical (MO) jukeboxes.  The early ones in this series are obsolete,
so you may be able to get one that's been traded in.  I've heard that DIAL is
the place to look for returned equipment. I haven't used it myself, but my
boss is quite adept at finding things... In any case, don't get something
expensive.  We might be able to help locate something cheap.  Best thing is
to ask locally and see if anyone's got a jukebox sitting in a corner gathering
dust.

An alternative is to use a simulated jukebox, using partitions on magnetic
disks as fake platters (which is an undocumented "feature" of HSM used only
for some specific testing long ago).  This isn't very realistic (for one
thing, the timing is very different, without platter mounting), but it's
useful for trying things out.  It isn't useful for learning normal HSM
jukebox configuration procedures, because that part is very different.

Did you see our Web page (internal to DEC only)?  You can get the kit and
docs from there:

http://slugbt.zso.dec.com/Products.d/hsm.d/hsm.html

If you get a real jukebox, you'll also need the CLC kit (changer and optical
drivers) -- the CLC Web page is:

http://slugbt.zso.dec.com/Products.d/scsicam.d/scsicam.html

Let me know if there are any broken links -- our Web server just got switched
to a new machine.

You'll also need licence paks -- I can send you some for internal use, as
long as the system isn't a production system, i.e. if it's just for practice
and testing.  Otherwise DEC needs to pay a royalty.

Note that HSM is not supported beyond v3.2whatever-the-last-one-was.  So
you'll need a system at v3.2something.  A good choice is whatever your
customer happens to be running.

                              * * *

> One thing I neglected to mention was that when I was having problems
> accessing files, because the system was updating attributes, possibly as
> a result of a chgrp/chmod command, I rebooted the system. Given that
> there were a number of things that the qr had queued for processing,
> what happens when a reboot is initiated
> (a) is the queue of requests lost or are they saved for processing when
> the machine is rebooted.

Requests are written to a checkpoint file periodically.  How often this is
done depends on two parameters in the /usr/efs/efscfg file:  If there are
Q_CKLEVEL or more requests in the queue, then HSM does checkpointing -- below
that, it doesn't bother.  While it's checkpointing, it updates the checkpoint
file after each Q_INTERVAL requests come in.  Defaults are Q_CKLEVEL = 10
and Q_INTERVAL = 3.  (This is section 6.6.1 in the HSM admin guide, but
there's a typo in that paragraph.)

If the system crashes before HSM gets a chance to checkpoint some requests,
then those requests will be lost.  Most lost requests do not affect normal
filesystem operation -- the only HSM operations that affect normal operation
are shelving and unshelving, and losing these is not typically a problem.
For example, if the system crashes while a user was reading a file (causing
it to be unshelved if it was on optical), the user is going to
have to re-do their command to read the file anyway, which would issue an
unshelve command again (if needed).  And anything that causes shelving would
also be started again (e.g. running the bulk ager).  All other requests are
for purposes of recording information that would be useful for rebuilding
the filesystem from optical alone (e.g. recording directories and file
attributes) -- losing these will cause problems during a rebuild (e.g. files
will show up with wrong protections or in the wrong place, or directories
might be missing, so files in them won't have any place to go, and will all
end up in / with temporary names), which is not good(!!) -- but it won't
affect a running filesystem.

> (b) if an accidental command were issued say a remove command (rm -r *)
> in the wrong directory, could the machine be closed down to stop the
> propagation to the opticals.
> (c) having interrupted the aging of an rm command to the opticals, how
> would you recover as much information as was still on the opticals when
> the machine is rebooted.

If someone were *very* fast, they might be able to stop *some* of the
"truncate" requests from getting through.  See below for a way to get
(some of) the files back.

The point is, *never* depend on this.  HSM is *not* backup -- a *real*
backup should be done regularly.  How is backup being done now?

Dare I ask...   Did this problem with rm happen?

> (c) having interrupted the aging of an rm command to the opticals, how
> would you recover as much information as was still on the opticals when
> the machine is rebooted.

> I assume that if requests are
> queued across a shutdown/reboot, then the information must be stored in
> a file somewhere, in which case that file would need to be removed and
> maybe the system rebooted again before it finished cataloging.

Close.  A better way to stop HSM from servicing requests is to shut *it*
(not the system) down -- that way one can still do things afterward.  Use
the shutdown script /usr/efs/PolycenterHSM.shutdown.

There are very bad consequences of deleting requests, however -- throwing
away requests is the cause of 99.999% of rebuild problems later.  There is
no way to separate out the unwanted "truncate" requests from any other
activity, like creating directories or changing file attributes or mv'ing
files -- *all* requests will be lost.  And it's very difficult to actually
get rid of requests -- it's not just HSM that's saving them -- AdvFS is
holding on to any requests that HSM hasn't asked for yet.  I know of no way
to tell AdvFS not to send them.

So rather than describe any dangerous and uncertain ways of deleting
requests, let me mention an alternative:  After stopping HSM, umount the
filesystem, then rebuild a *new*, temporary, filesystem from the *same*
optical platters, onto an *entirely new* disk (or set of disks).  Any files
that hadn't yet been deleted will show up in this temporary filesystem, and
they can be copied out to yet another filesystem.  Then umount the temporary
filesystem, mount the real one again, ***let HSM finish doing the rm's***
so that the filesystem is in a consistent state between magnetic and optical,
and *later*, copy the files back in.

Two warnings:

Do not attempt to mount both the original and rebuilt filesystems
at the same time -- they must both have the same "filesystem id" in order
to use the same platters, meaning, to HSM, they *are* the same filesystem.

Be very careful to avoid doing anything to the real filesystem's magnetic
disks, because the temporary filesystem won't be usable for anything other
than getting back some of the rm'd files -- it won't contain any files that
hadn't yet been shelved, so it will likely be missing lots of stuff.

That's just an outline of the process -- I haven't put in the details.

> I'm a little unsure as to what happens with queued requests, since the
> first thing that happens when the system is booted is all the optical
> disks are catalogued

You're right -- HSM will not process requests until the jukebox inventory
is done.  But I believe HSM does not even start asking AdvFS for requests
until after it's ready to process them.  So there isn't really a window
during which requests could be gotten rid of -- by the time they show up
in the HSM queues, HSM is already working on them.

> During a shutdown/reboot
> the jukebox remains up, so I assume that the cataloging process is for
> the purposes of the software running on the host and that software
> translates accesses to a platter, to a physical location.

HSM keeps a database of what platter is where, and what their states are,
so if nothing is done to the jukebox while HSM isn't running, it doesn't
strictly *need* to do the inventory.  What it's doing is making sure that
the platters are where they're supposed to be, and checking that their
filesystems are mountable.  (Each platter in use by HSM has a UFS -- Unix
file system -- on it.)  So mainly, it does the inventory to make sure no-one
messed around with the jukebox while it was out of HSM's hands...

> I've been mindful to shutdown the system when the platters are not being
> rattled

The "qqr" command will show if there are any active requests.

If the startup and shutdown links for HSM were put in during HSM installation
(this is optional), then doing a "shutdown -h" will run all the shutdown
commands including HSM shutdown.  Once HSM is shut down, there's no worry
about losing requests.  (Note that just plain "shutdown", "shutdown -r",
and "reboot" do not run the shutdown procedures.)

> The other thing with respect to the jukebox is I think it runs slow
> because of the speed of the optical drives, perhaps they are just single
> speed devices. Is it possible to upgrade the reader/writer?

The supported drives are the rwz51 (single density) and rwz52 (supports
single and double density).  A patch would be needed to use rwz53 (single,
double, and quad density) drives.  I don't know right off if there's any
speed difference.  This should be in the product description for the drives,
which should be available on the web site http://www.digital.com , or I can
have a hunt for the product descriptions.

Usually drive speed is not a problem.  I've done multiple cp -r's of large
filesystems into an HSM filesystem, and the ager keeps up, because the cp's
are forced to wait.

If shelve requests are not being processed quickly, another possibility is
that the AdvFS metadata (on magnetic) may be getting fragmented, which would
make it take a long time for AdvFS to process responses from HSM, after it
has written a file out to optical, to record the optical location in that
file's (magnetic) metadata.   To find out if this is the case, do (as root):

  cd /xxx/.tags
  /usr/sbin/showfile -x M-6

where /xxx is the filesystem's mountpoint.  This shows how big all the extents
are in the magnetic metadata on the first volume in the domain.  If there are
more volumes, then repeat the showfile command on their metadata, which will
be M-n where n is 6 times the volume number, e.g. for volume 3, do:

  /usr/sbin/showfile -x M-18

If there are several hundred extent, or the extents are very little (e.g. if
some have sizes smaller than, say, 32), then the metadata is badly fragmented.
The procedure for "fixing" this involves creating a new domain on another
(set of) disk(s) with larger parameters to the mkfdmn and addvol commands
to control the metadata extent size and initial allocation, and backing up
the original magnetic part of the filesystem using hmdump, then restoring it
into the new domain with hmrestore.  As with rebuilding, the old and new
filesystems can't be mounted at the same time.  Again, this is only a sketch
of the process.

-- Pat