[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::buildhelp

Title:USG buildhelp questions/answers
Moderator:SMURF::FILTER
Created:Mon Apr 26 1993
Last Modified:Mon Jan 20 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2763
Total number of notes:5802

1521.0. "pt nightly standby sup question - how standby works" by AOSG::FILTER (Automatic Posting Software - mail to flume::puck) Tue Jun 06 1995 22:41

Date Of Receipt: 	 6-JUN-1995 18:21:38.21
From: 	SMURF::FLUME::jmf "Joshua M. Friedman OSF/UNIX SDE  06-Jun-1995 1814"
To: 	decwet::daddamio
CC: 	decwet::anderson, odehelp@DEC:.zko.flume, mwarren@DEC:.zko.flume,
	tresvik@DEC:.zko.flume
Subj: 	pt nightly "standby" sup question - how standby works

Jan,

John Flanagan stopped by to let me know that you thought you wanted to
sup the ptos.nightly.standby tree.  I gather from John and from our
1/2-minute phone call that people "got burnt" when nightly updated
recently, because the build was good, but the kernels didn't boot.

I don't believe supping the standby tree addresses this.  Here's a
description of how the nightly tree toggle works, and a suggestion I
have of how you might try to address the problem.

The main purpose of our standby tree is to provide a way for the nightly
tree to be updated more-or-less instantaneously, so that users sandboxes
and builds in progress are not broken for a long time during the update.
It does _not_ address trying to keep an older known-good build around.

We have two nightly trees for the ptlite and pt os streams, really
called ptos.nightly0 and ptos.nightly1 for pt.  They are created and
marked in rcs using these 0/1 names, but there are symbolic links for
user access called ptos.nightly and ptos.nightly.standby.  Also, the
ode "sandbox_base" variable is set to use these symbolic names as well.

Each night after the build completes, *whether it fails or suceeds*, it
updates the "standby" area, copying from the build tree (ptos.bld).
This takes about an hour.  Once the update is completed, then, if the
build was "good" (see below), it toggles the nightly and
nightly.standby links, so that nightly appears to update immediately.
If the build was not successful, it leaves nightly & standby alone.

So, if the build was good:
	ptos.nightly contains this night's good build
	ptos.nightly.standby contains the previous good build

So, if the build failed:
	ptos.nightly still contains the previous good build
	ptos.nightly.standby contains this night's bad build

Note that "good" here means all the kernels and libc built.  It means
nothing about whether the kernels booted or not, or if there are broken
commands.  This is our success criteria.


In the case of the situation in pt recently, from what I understand,
this was the sequence of builds (or something like this):

    good build
    merge was done from ptlite bl2 to bl4
    bad build
    good build, but kernel failed to boot
    good build, but kernel failed to boot
    good build, kernel booted ok

In this case, the first bad build after the merge would have updated
standby.  Nightly still represented the pre-merge bits.

The first good build would have updated standby again, then toggled.
Now nightly has the good build, and standby has the old pre-merge bits.

The next good build would have updated standby again, and then toggled,
leaving a kernel that built but didn't boot in both nightly and standby.

Finally, the good build with the kernel that booted would have ended up
in nightly, and the previous good build with the kernel that didn't
boot would have ended up in standby.

It would never have been particularly useful to sup standby, unless you
want a copy of the bad build to help debug.  


Here's something that can be done, and I believe is or was done at unx
and palo alto, however it still may not completely address the
problem.  Sup nightly into a different namespace (basedir=...), and do
your own toggling locally at zso, so that you can revert to the
previous day's sup'd nightly, if needed.  The pool that's sup'd can be
called something other than nightly, and you could manually determine
when to use the new and when to throw away the old nightly tree, after
manual kernel boot tests perhaps.  This is fairly hand-on, however.


I'd be happy to discuss this further if you like...
	
		-josh

T.RTitleUserPersonal
Name
DateLines