T.R | Title | User | Personal Name | Date | Lines |
---|
2692.3 | .PST ??? | GIDDAY::BURT | Chele Burt - CSC Sydney, DTN 7355693 | Wed May 12 1993 06:16 | 16 |
| Hello again,
As I mentioned in the base note, deleting & recreating username.PST for the
user, then doing the run, works, but then the problem re-occurs.
This leads me to think (yep, I do that sometimes) that something is breaking,
or at least damaging, the .PST
What is a likely culprit, bearing in mind that this user may create/receive
UP TO 1 msg a week (& do nothing else in ALL-IN-1 - not a "power user" :^) )
Thanks & regards,
Chele B
|
2692.4 | What symbols are set? | AIMTEC::WICKS_A | on the Streets of San Francisco | Wed May 12 1993 07:30 | 8 |
| Chele,
What's in the .PST at the time
I still think it's an MJU problem though.
Andrew.D.Wicks
|
2692.5 | dry reading | GIDDAY::BURT | Chele Burt - CSC Sydney, DTN 7355693 | Thu May 13 1993 05:53 | 58 |
| PST
$MAIL_LAST
$OA_VERSION ALL-IN-1 V2.4
CAL_REM_NUMBER 1
CAL_TODO_NUMBER 1
FULLNAME Ray Beattie
NIFULL
NINAME
PARSED_USER_NAME AABEAR
PARSE_USER_NAME AABEAR
RESULT AABEAR
TM_ACTITEM_CONVERT 1
USERNAME AABEAR
WPC_READ_OFFSET 02
WPS_READ_SYMBOL 02
WPDOC WP 000003
(The 000003 is actually at char 61)
................................................................................
Extract from the log:
.
.
.
A total of 10 OUTBOX document(s) refiled for AABEAC
Improperly handled condition, image exit forced.
Signal Arguments Stack Contents
Number = 00000005 00000000
Name = 0000000C 20FC0000
00000000 7FE87374
00000074 7FE87320
00136166 00139F0D
03C00004 086B8103
001595C7
00A073D4
00000006
00089F50
Register Dump
R0 = 00000000 R1 = 00327ACA R2 = 00000000 R3 = 00089F50
R4 = 00A073D4 R5 = 0008F0F8 R6 = 00189EFC R7 = 00089D74
R8 = 001845CB R9 = 001847B5 R10= 0008F5A4 R11= 0008F424
AP = 7FE9729C FP = 7FE8725C SP = 7FE872D8 PC = 00136166
PSL= 03C00004
%SM_MJU-E unexpected error condition
_SYSTEM_F_ACCVIO, access violation, reason mask =!XB, virtual
address=!XL, PC=!XL, PSL=!XL
%SM_MJU_I_DONE processing complete for %SM_MJU
.
|
2692.6 | Well, that narrows it down to about 10k possibles | IOSG::CHINNICK | gone walkabout | Mon May 17 1993 18:59 | 33 |
| Hi Chele...
I have a couple of questions regarding this problem...
- Does ALL-IN-1 get run interactively or in batch when this happens?
- Are the documents in WPC format being touched by this process or only
the file cabinet entries??
- Can you check that ALL-IN-1 was linked properly - no linker errors
like unresolved symbols?
- Can you work out from a map or a debug image (@A1LNKDRV OPTIONS D)
where the ACCVIO PC is (and more importantly, whose code??)
- Are any patches installed on the customer system??
- Are any users getting such stack dumps during their normal operation
and activities?
This could be any of a number of problems:
- link error
- ASSET problem
- WPC DSAB problem
- File Cabinet corruption
- etc etc etc.
If you can answer some of these questions, we might at least be able to
have a stab as to which it is.
And just don't let Andy hassle you - he's just jealous of us Aussies! :-)
Paul.
|
2692.7 | more stuff | GIDDAY::BURT | Chele Burt - CSC Sydney, DTN 7355693 | Tue May 18 1993 09:48 | 32 |
| re <<< Note 2692.6 by IOSG::CHINNICK
Customer running ALL-IN-1 V2.4 with patches K601 & K602. WordPerfect.
In addition to MJU, also has SFCP (not used by the problem bod).
MJU hitting READ and OUTBOX only (run in batch)
VMS 5.4-2 & hasn't linked since 2.4 was installed - ie 13-July-1991 (she can't
recall if there were any problems with symbols at link time)
No-one else seems to have reported accvios during "normal" operations.
Customer doesn't seem to be doing much in the way of housekeeping - EW, REO &
RSF only. (no TRM or TRU)
I have requested that the customer do a TRU hitting the prob-bod only.
Still to be answered:
> - Can you check that ALL-IN-1 was linked properly - no linker errors
> like unresolved symbols?
> - Can you work out from a map or a debug image (@A1LNKDRV OPTIONS D)
> where the ACCVIO PC is (and more importantly, whose code??)
Is it workwhile re-linking?
Thanks'n'regards,
Chele
RE:
> And just don't let Andy hassle you - he's just jealous of us Aussies! :-)
Hey, he's Welsh and so is Fireman Sam. Fireman Sam, I have it on very good
authority, is DEFINITELY one of the Good Guys.
|
2692.8 | ASSETS and Patches - bad news... | IOSG::CHINNICK | gone walkabout | Tue May 18 1993 14:57 | 44 |
|
Chele,
Well... K602 has some improvements to the CAB code to make it more
robust, so it shouldn't (in principle) be a corrupt file cabinet.
Having said that though - I'm not sure about the compatibility of SFCP
with those patches and/or the other ASSETS package installed. Normally
it is fairly safe to assume gross incompatibility! I'd assume the worst
- that all of those things conflict. You can check by looking at the
kits and seeing what modules they supply, but almost certainly there is
some problem around this. It may be that the ASSETS work but that their
installation has disabled (i.e. replaced) parts of the patches.
It might be worth running a check on the file cabinet (PDAF and
SDAFs/PENDING) to see if there is anything wrong there too. It may be
that the K602 code is not active (because of the aforementioned
incompatibility) and hence is not protecting against cabinet
corruptions.
Since a certain editor - which shall remain nameless, but which the
customer has - is known to cause memory corruption in some versions, it
might be a reasonable guess that there has been some DAF corruption.
Also, these things have a habit of hanging around (from pre-V2.3 even
in some instances).
Beyond that, a re-link is worthwhile - maybe not as much as downing a
couple of cold beers - but nevertheless worthwhile. You might just try
a debug link to see if it hangs together OK and then try to track down
that PC value.
Housekeeping might help any cabinet structure problems - although
again, it helps a lot to have the 'right' version of the TRM/TRU code.
That is something I try not to get too embroiled in - there is quite
enough else to worry about. "Andrew.D.Wicks", the world-famous
Welshman, may be able to comment on which patch for FCVR is
appropriate and works best since I expect he gets asked almost daily?
Again, there is precious little to go on, but I'd focus on the
possibility of cabinet problems for the present.
Good Luck!
Paul.
|
2692.10 | SFCP is innocent ok? | AIMTEC::WICKS_A | Alphatraz - Coming Summer 93 | Tue May 18 1993 19:38 | 6 |
| SFCP and all ALL-IN-1 patches up to and including K605 were certified
as living happilly together before the Charlotte group was disbanded.
Regards,
Andrew.D.Wicks
|
2692.11 | | GIDDAY::BURT | Chele Burt - CSC Sydney, DTN 7355693 | Wed May 19 1993 05:52 | 16 |
| Hello again all,
TRU for the problem bod ran without any errors. The user has only been on the
system for a couple of months, and he is the only one causing any problems.
The Customer has been running MJU, with SFCP & the "W"editor successfully for
YEARS.
Is it at all possible it's one of those mysterious ALL-IN-1 biases ?
(what IS the plural of bias?)
eg "I don't like the length/flavour/spelling of this username" therefore
something is going to get trashed
I'll try & get a re-link & FCVR (maybe PT instead) scheduled.
Thanks again,
Chele
|
2692.12 | I've drawn a blank... need ACCVIO details | IOSG::CHINNICK | gone walkabout | Wed May 19 1993 14:51 | 54 |
|
Re: .11
Hi Chele,
OK... thanks for running the checks...
Just one point about TRU... It wont tell you about some types of errors
- particularly some forms of DAF corruption. It just tries to fix it
and keeps quiet. In some cases, I'm not even convinced it fixes such
problems! TRU (aka FCVR) really targets the high-level structure of the
file cabinet to maintain DOCDB/DAF relationships and usage counts and
it does do quite a good job on this level.
Going back to the ACCVIO... I really think you are going to have to
give at least the message you get (reason, VM, PC) and decode this PC
so we have some clue as to what might be happening. If you are really
serious about tracking this, it might be worth getting hold of a
process dump and a debug image so that more detailed analysis can be
done.
It could be some mysterious ALL-IN-1 bias - but normally these are
triggered by things like quote (') or dot (.) in the username or very
long usernames etc. Of course, it could be that there is another user
on their system who has this type of problem but which only shows up in
a larger run??
The real problem is that for me at least, MJU is a black box - I don't
know at all what is in it so I can only work it like any other ALL-IN-1
ACCVIO. And the first place to look is the PC where it crashes.
Re: .10
Andy,
this might sound strange, but I'd like to know where you got your
information about ASSETS and patch compatibility... I've been away from
ALL-IN-1 support for about 3 years [not long enough some might say
:-)], but I have asked the relevent people here in support who should
know and they haven't heard any such thing.
Soooo... perhaps nobody has bothered to tell us.
Of course, there may be some difference between 'co-exists' and
'compatible'. If you install an asset module after a patch, you are
most likely to just lose the patch changes.
I don't have much difficulty in believing that SFCP is ok up to K605
because it is quite a heavily used package. MJU on the other hand is
not even on the cross-reference listing I have!
Paul.
|
2692.13 | more stuff | GIDDAY::BURT | Chele Burt - CSC Sydney, DTN 7355693 | Thu May 20 1993 07:44 | 76 |
| >> Going back to the ACCVIO... I really think you are going to have to
>> give at least the message you get (reason, VM, PC) and decode this PC
By your command!
Improperly handled condition, image exit forced.
Signal arguments Stack contents
Number = 00000005 00000000
Name = 0000000C 20FC0000
00000000 7FE87374
00000074 7FE87320
00136166 00139F0D
03C00004 086B8103
001595C7
00325E04
00000006
00089F50
Register dump
R0 = 00000000 R1 = 00A0C022 R2 = 00000000 R3 = 00089F50
R4 = 00325E04 R5 = 0008F0F8 R6 = 00189EFC R7 = 00089D74
R8 = 001845CB R9 = 001847B5 R10= 0008F5A4 R11= 0008F424
AP = 7FE8729C FP = 7FE8725C SP = 7FE872D8 PC = 00136166
PSL= 03C00004
%SM_MJU-E unexpected error condition
-SYSTEM-F-ACCVIO,
access violation, reason mask=!XB, virtual address=!XL, PC=!XL, PSL=!XL
%SM_MJU-I-DONE processing complete for %SM_MJU
%OA-I-LASTLINE,
%OA-I-LASTLINE,
%END_BATCH-I Deleting old log files "OA$LOG:MJU_SM.LOG*;*" before date
"18-NOV-
%END_BATCH-I Deleting old log files "OA$LOG:MJU_SA.LOG*;*" before date
"18-NOV-
%END_BATCH-I performing %END_BATCH exit and cleanup processing
%END_BATCH-I processing completed ok for %END_BATCH
%SMJACKET-I %SMJACKET facility exiting due to error
<CR><LF><CR><LF><CR><LF><CR><LF><CR><LF> MANAGER finished using
ALL-IN-1
%SMJACKET-I performing %SMJACKET exit and cleanup processing
%SMJACKET-I close lockfiles
%SMJACKET-E processing completed with an error for %SMJACKET
ALLIN1 job terminated at 17-MAY-1993 00:43:32.61
> so we have some clue as to what might be happening. If you are really
> serious about tracking this, it might be worth getting hold of a
> process dump and a debug image so that more detailed analysis can be
> done.
How?
> The real problem is that for me at least, MJU is a black box - I don't
You are not alone!
According to the customer, the PT only took 9 minutes - for over 900 users!
We then found that the customer was running an old version of FCVR exe....
(it was is found first in search list)
customer to rename .exe & re-run the PT tonight, after correcting the problem
with "Error detected with account BSG ...
"Couldn't open BSG's DOCDB -- ?Bad directory for device"
(delete profile for non-existent user)
The re-link mostly went fine - with one exception.
When trying to READ the WordPerfect portion of a document, gets an error
"optional software product WPcorp OA not installed"
Thanks'n'regards,
Chele
|
2692.15 | Do you think they'd fly me out to help? ;-) | IOSG::CHINNICK | gone walkabout | Thu May 20 1993 15:44 | 96 |
|
The saga continues... maybe I'll write a book about all of this stuff
one day...
Anyway... Chele,
I suggest that you may have some form of link problem by virtue of the
WPCORP error on the READ.
That aside, here are some instructions for how to decode that PC value:
Either:
- Check a link map (OA$BUILD_SHARE:OA$MAIN.MAP) and look up the
module and psect list to find the address closest to but less than
the PC (=136166).
- Note the module name and its base address so we can get a module
and code offset
Or:
- Link a debug version of OA$MAIN - which means one which is
identical to what is being run for MJU:
$ SET DEFAULT OA$BUILD_SHARE:
$ @OA$BUILD:A1LNKDRV OPTIONS D
- Then $ RUN/DEBUG OA$BUILD_SHARE:OA$MAIN
DBG> SET MODULE/ALL
DBG> EXAM/INSTR 136166 ! or whatever PC value is required
- DEBUG should give the module/routine/line information and tell
us the instruction which it died on.
Now, in conjunction with such a debug image, it can be useful for some
problems to set up for a process dump. Assuming that the OA$MAIN image
is giving the ACCVIO (check the MJU procedure!) and assuming it is
crashing out all the way to DCL (which I think it is), you should be
able to get a crash dump just by putting:
$ SET PROCESS/DUMP
into the command sequence (e.g. in the MJU command file). Then when it
crashes, it will write a process dump file.
This lets those with a masochistic streak (read "Support Specialist")
get a dump of memory as a file OA$MAIN.DMP which can be used with a
debug version of OA$MAIN to see what was going on when the crash
occured. A bit like SDA really, but for images not VMS.
Armed with a PC value - and if you are really keen, a process dump
file, those with the patience then return to the source code to try to
figure out what could have possibly gone wrong.
In your case, it could be an ASSETS problem, which might be a bit
sticky 'cause I don't know who owns these things any more. Andrew may
care to comment since he keeps his ear to the ground on such matters.
Unfortunately, there is a great shortage of people who have both intimate
VMS and ALL-IN-1 knowledge in the world so it isn't uncommon to find
that nobody knows all of this stuff about how to investigate ACCVIO
type problems. [Maybe I'll retire to Jurassic Park. :-)]
The ACCVIO by itself is not a lot of help - we can simply suggest that
it could be an unresolved WEAK symbol reference for some data structure
or some parameter passing problem or... ? The good news is that it
ought to be easy to track down if we have all the required information.
The bad news is that we probably wont have all of that since we dont
have the MJU sources (and probably don't want them if we can avoid it)!
Before we descend permanently into this rat-hole of trying to track
this one down it may be wise to ask a few questions about ASSETS
support. I don't know who used to support ASSETS but I assume it was
Charlotte which has now 'gone west'. We (with official IOSG hat on)
don't support ASSETS here. It is rumoured that some are supported by a
group in Europe but I don't know who or if MJU is amongst the packages
supported.
I think I'd be getting a little worried about this if I were you are
pushing it the way of your management before the brown stuff hits the
fan? This might become a support contract issue. How much support do
they have and how does that balance against your time spent.
Besides that caveat, we surely can't support them unless they can get
their system re-linked and working that way. Other than that, you could
try looking into what MJU does - does it constitute much code-level
integration or is it scripts? If the latter, then turn on trace so we
can see what it does. But we are rapidly exhausting the approaches which
can be pursued to solve this.
More questions than answers...
Paul.
|
2692.18 | MJU, a potted history | SIOG::T_REDMOND | Thoughts of an Idle Mind | Mon May 24 1993 14:51 | 23 |
| The "Old" MJU is a hack (in the kindest sense of the word). It's a
very complicated script that does all sorts of weird and wonderful
loops through documents in cabinets on behalf of some or all users.
End of recall. I last looked at the old MJU 18 months ago when we were
wondering what to do for V3.0. At that time the thought of having to
upgrade the convolutions of MJU V1.2 to handle drawers assumed a
horrible life of its own and reduced grown men to boys (again).
Can you run MJU for this single user (on his own).
Can you run MJU with TRACE enabled so that some sort of clue as to what
function is provoking the ACCVIO is detected?
Is the user's DOCDB.DAT and DAF.DAT in good health? (ANAL/RMS_FILE)
Maybe some judicous use of the DUMP_CACHE function might help. Just to
make sure that everything is flushed to disk.
Many ASSETS, as always, are supported through the good nature of people
who either originally provided them to the general ALL-IN-1 community
or who have battled with them since.
Tony
|
2692.19 | | GIDDAY::BURT | Chele Burt - CSC Sydney, DTN 7355693 | Wed May 26 1993 06:19 | 27 |
| Hello again,
> Can you run MJU for this single user (on his own).
Yes. When this has been done, it has worked successfully. When he is included
in a complete run, MJU seems work on alternate runs.
The user has VERY little in ALL-IN-1, a couple of read mail msgs only (I'm not
entirely sure why they even bothered giving him access)
> Can you run MJU with TRACE enabled so that some sort of clue as to what
> function is provoking the ACCVIO is detected?
it can't be guaranteed to fail - and how do you set a TRACE on for a batch
run?
> Is the user's DOCDB.DAT and DAF.DAT in good health? (ANAL/RMS_FILE)
Did this earlier on, and all was OK.
> Maybe some judicous use of the DUMP_CACHE function might help.
How?
I _do_ appreciate everyone's assistance with this. I appreciate that there is
no commitment to help from anyone, and that this is not an official support
channel.
Thankyou all,
Chele
|
2692.20 | 2395 | AIMTEC::BUTLER_T | | Wed May 26 1993 17:27 | 10 |
| Chele,
<<it can't be guaranteed to fail - and how do you set a TRACE on for a
<<batch run?
See note 2395 in this conference.
Tim
|
2692.22 | You could still try to get a process dump? | IOSG::CHINNICK | gone walkabout | Tue Jun 01 1993 19:31 | 40 |
| Hello 'Chele...
Back from my week in Norfolk... I see not much progress on this one.
Customers tend to be a little difficult sometimes - like wanting you to
solve their problems without even logging on to their system!
I suggest that if they don't want to run with trace, then your only
alternative is going to be to get an image dump which can be examined
to try to work out what is happening.
I covered this before, but just to recap:
1. Include the command $ SET PROCESS/DUMP somewhere in the command file
for MJU or login sequence for the account. Then when ALL-IN-1
crashes out (back to DCL) there will be a large (10-20k blocks)
OA$MAIN.DMP file written to the ALL-IN-1 subdirectory.
[ You can redirect this by defining SYS$PROCDMP logical].
2. Re-link a DEBUG image using $ @A1LNKDRV OPTIONS D
This can be used to help with analysis of the dump file.
Now - I understand that part of the problem here is that the re-link
like this doesn't produce exactly the same image layout as they are
running, but it is all that we can try. If you can get this
information, then it may be possible to deduce what is happening.
Other than that you could try running CABFIX against their system [you
should be able to get hold of this since you are in the CSC] to see if
it throws up any DAF type corruptions. I gather you haven't had FCVR up
and running yet?
As said before - possibilities beyond this are few - unless you are
really desparate.
Sorry we can't do more.
Paul.
|
2692.24 | end of saga | GIDDAY::BURT | Chele Burt - CSC Sydney, DTN 7355693 | Mon Aug 02 1993 10:31 | 13 |
| Hello Again,
The customer is back from hols, and after ages of playing telephone tag has
decided not to play anymore. No time for tests, they'll just use the
workaround of doing 2 runs - one of everyone bar Mr. Problem, and one with
just Mr. Problem.
I hate missing the last page in a saga! :)
Thanks for all your help!
Chele
|