[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference mvblab::sable

Title:	SABLE SYSTEM PUBLIC DISCUSSION

Moderator:	COSMIC::PETERSON

Created:	Mon Jan 11 1993
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2614
Total number of notes:	10244

2561.0. "AS 2100 4/233 fails to boot, hangs in "test"" by TIMABS::FREPPEL (Mosquito ergo summm...) Mon Mar 24 1997 17:26

    An attempt to boot an AlphaServer 2100 4/233 repeatedly fails in the 
    following way and a "test" hangs forever...
    
    P00>>>b
    (boot dua2243.0.0.6.0 -flags 3,0)
    block 0 of dua2243.0.0.6.0 is a valid boot block
    reading 1013 blocks from dua2243.0.0.6.0
    bootstrap code read in
    base = 200000, image_start = 0, image_bytes = 7ea00
    initializing HWRPB at 2000
    initializing page table at fff0000
    initializing machine state
    setting affinity to the primary CPU
    jumping to bootstrap code
    
    halted CPU 0
    
    halt code = 5
    HALT instruction executed
    PC = 20000000
    P00>>>
    
    P00>>>init
    
    VMS PALcode V5.56-6, OSF PALcode X1.45-12
    
    starting console on CPU 0
    probing hose 0, PCI
    probing PCI-to-EISA bridge, bus 1
    bus 0, slot 0 -- ewa -- DECchip 21040-AA
    bus 0, slot 1 -- pka -- NCR 53C810
    bus 1, slot 2 -- vga -- ISA VGA
    bus 0, slot 6 -- pua -- CIPCA
    bus 0, slot 7 -- pub -- CIPCA
    bus 0, slot 8 -- fwa -- DEC PCI FDDI
    Memory Testing and Configuration Status
    Module   Size    Base Addr   Intlv Mode  Intlv Unit  Status
    ------   -----   ---------   ----------  ----------  ------
    0      128MB   00000000      2-Way         0       Passed
    1      128MB   00000000      2-Way         1       Passed
    Total Bad Pages 0
    Testing the System
    Testing the Disks (read only)
    Testing the Network
    AlphaServer 2100 Console V4.7-143, built on Nov 20 1996 at
    
    
    P00>>>test
    Testing the Memory
    
    Now the system hangs (uninterruptable)
    
    Any hints what could cause this?
    Any ideas what to check?
    
    Thanks
    Raymond.

T.R	Title	User	Personal Name	Date	Lines
2561.1	heap_expand?	AFW3::MAZUR		`Tue Mar 25 1997 00:53`	120
	Raymond, Is this a new install? If not, what is different about this system? Has the configuration changed? Did this system boot in the past? Did you add memory? Or CIPCAs? It appears as if the memory where the boot block was loaded is corrupted, broken, or misaddressed. The code at virtual memory 20000000 was a zero. And from my log below compared to yours, it looks as if the page table address is in error: Your system: initializing page table at fff0000 My system: initializing page table at 1f2000 I am going to take a guess that you need to increase the heap_expand environment variable. Try this, and report the results: >>> show heap_expand >>> b dua2243.0.0.6.0 -flags 3,0 -h >>> e pmem:fff0000 -n f -l >>> e vmem:20000000 -n f -d >>> e pmem:200000 -n f -d >>> dyn >>> set heap_expand 128K >>> init >>> show heap_expand >>> b dua2243.0.0.6.0 -flags 3,0 -h >>> e pmem:fff0000 -n f -l >>> e vmem:20000000 -n f -d >>> e pmem:200000 -n f -d >>> dyn >>> cont The results should look something like: P00>>>show heap_expand heap_expand NONE P00>>>b -h dka0 (boot dka0.0.0.1.0 -flags 0) block 0 of dka0.0.0.1.0 is a valid boot block reading 16 blocks from dka0.0.0.1.0 bootstrap code read in base = 200000, image_start = 0, image_bytes = 2000 initializing HWRPB at 2000 initializing page table at 1f2000 initializing machine state setting affinity to the primary CPU P00>>>e pmem:1f2000 -n f -l pmem: 1F2000 00001101 pmem: 1F2004 000000FA pmem: 1F2008 00001101 pmem: 1F200C 000000F9 pmem: 1F2010 00000000 pmem: 1F2014 00000000 pmem: 1F2018 00000000 pmem: 1F201C 00000000 pmem: 1F2020 00000000 pmem: 1F2024 00000000 pmem: 1F2028 00000000 pmem: 1F202C 00000000 pmem: 1F2030 00000000 pmem: 1F2034 00000000 pmem: 1F2038 00000000 pmem: 1F203C 00000000 P00>>>e vmem:20000000 -d -n f vmem: 20000000 47FF041F NOP vmem: 20000004 C0200002 BR R1,000002 vmem: 20000008 200099A0 LDA R0,-6660(R0) vmem: 2000000C 00000000 HALT vmem: 20000010 A7A10000 LDQ R29,(R1) vmem: 20000014 D3400007 BSR R26,000007 vmem: 20000018 43C4153E SUBQ SP,#20,SP vmem: 2000001C 47FF0410 CLR R16 vmem: 20000020 B7FE0018 STQ R31,0018(SP) vmem: 20000024 47FF0411 CLR R17 vmem: 20000028 47FF0412 CLR R18 vmem: 2000002C D3400110 BSR R26,000110 vmem: 20000030 00000000 HALT vmem: 20000034 243D0000 LDAH R1,(R29) vmem: 20000038 20218070 LDA R1,-7F90(R1) vmem: 2000003C 245D0000 LDAH R2,(R29) P00>>>e pmem:200000 -d -n f pmem: 200000 47FF041F NOP pmem: 200004 C0200002 BR R1,000002 pmem: 200008 200099A0 LDA R0,-6660(R0) pmem: 20000C 00000000 HALT pmem: 200010 A7A10000 LDQ R29,(R1) pmem: 200014 D3400007 BSR R26,000007 pmem: 200018 43C4153E SUBQ SP,#20,SP pmem: 20001C 47FF0410 CLR R16 pmem: 200020 B7FE0018 STQ R31,0018(SP) pmem: 200024 47FF0411 CLR R17 pmem: 200028 47FF0412 CLR R18 pmem: 20002C D3400110 BSR R26,000110 pmem: 200030 00000000 HALT pmem: 200034 243D0000 LDAH R1,(R29) pmem: 200038 20218070 LDA R1,-7F90(R1) pmem: 20003C 245D0000 LDAH R2,(R29) P00>>>dyn zone zone used used free free utili- high address size blocks bytes blocks bytes zation water -------- ---------- ------- ---------- ------- ---------- ------- ---------- 000309A0 849088 375 306336 29 542784 36 % 407584 P00>>>
2561.2		AFW3::MAZUR		`Tue Mar 25 1997 13:04`	21
	> > Your system: > initializing page table at fff0000 > > My system: > initializing page table at 1f2000 > Correcting myself, there is no problem here. Recent console changes have move the page table into high memory, and that is reflected correctly in the boot display you have. Still try the increasing heap_expand. I think the failure mode would have been different if there was a heap problem, but I cannot say that for sure.
2561.3	writeboot	TIMABS::FREPPEL	Mosquito ergo summm...	`Wed Mar 26 1997 11:46`	55
	Hi Dennis, thanks for the quick answer. The system is booted now. After discarding a couple of theories we ended up doubting the good shape of the systemdisk we were attempting to boot from (a shadow set member - we currently have a whole lot of trouble with shadowing in this cluster, therefore the doubts seem reasonable). What we actually did was: - mount the system disk as a data disk on a different system - do a WRITEBOOT to it - dismount the disk - boot the sable And - the system booted. So, did the system lie to me when it said ------------+ P00>>>b -fl 3,1 \| (boot dua2243.0.0.6.0 -flags 3,1) \| block 0 of dua2243.0.0.6.0 is a valid boot block <---+ reading 1013 blocks from dua2243.0.0.6.0 bootstrap code read in base = 200000, image_start = 0, image_bytes = 7ea00 initializing HWRPB at 2000 initializing page table at fff0000 initializing machine state setting affinity to the primary CPU jumping to bootstrap code halted CPU 0 halt code = 5 HALT instruction executed PC = 20000000 Thanks for your help. Raymond. PS: We also did the heap_expand (but to no avail): P00>>>sho heap* heap_expand NONE P00>>>set heap_expand 1024 P00>>>sho heap* heap_expand 1024K In the display of a "cat el" command we saw: Starting Memory Diagnostics ***Error - Corrupt IIC error log on module 1 Resetting the error log number to zero Could this cause any trouble?
2561.4		CLOUD::SHIRRON	Stephen F. Shirron, 223-3198	`Wed Mar 26 1997 14:05`	6
	The console prints "... is a valid boot block" if the boot block (LBN 0) passes several validity tests (is the checksum okay, are certain reserved fields zero like they should be, etc.). The image that the boot block points to could be corrupted, and the console would not know. stephen
2561.5		AFW3::MAZUR		`Wed Mar 26 1997 14:09`	10
	> ***Error - Corrupt IIC error log on module 1 > Resetting the error log number to zero > > Could this cause any trouble? > I do not know the answer to this. I have done more TurboLaser and am unfamiliar with the error modes on this platform. It does look like it was self correcting though.
2561.6		TIMABS::FREPPEL	Mosquito ergo summm...	`Thu Mar 27 1997 06:26`	6
	re .4 & .5: Thanks a lot for your fast clarifications. I appreciate your help. Raymond.