[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference mvblab::sable

Title:SABLE SYSTEM PUBLIC DISCUSSION
Moderator:COSMIC::PETERSON
Created:Mon Jan 11 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2614
Total number of notes:10244

2561.0. "AS 2100 4/233 fails to boot, hangs in "test"" by TIMABS::FREPPEL (Mosquito ergo summm...) Mon Mar 24 1997 17:26

    An attempt to boot an AlphaServer 2100 4/233 repeatedly fails in the 
    following way and a "test" hangs forever...
    
    P00>>>b
    (boot dua2243.0.0.6.0 -flags 3,0)
    block 0 of dua2243.0.0.6.0 is a valid boot block
    reading 1013 blocks from dua2243.0.0.6.0
    bootstrap code read in
    base = 200000, image_start = 0, image_bytes = 7ea00
    initializing HWRPB at 2000
    initializing page table at fff0000
    initializing machine state
    setting affinity to the primary CPU
    jumping to bootstrap code
    
    halted CPU 0
    
    halt code = 5
    HALT instruction executed
    PC = 20000000
    P00>>>
    
    P00>>>init
    
    VMS PALcode V5.56-6, OSF PALcode X1.45-12
    
    starting console on CPU 0
    probing hose 0, PCI
    probing PCI-to-EISA bridge, bus 1
    bus 0, slot 0 -- ewa -- DECchip 21040-AA
    bus 0, slot 1 -- pka -- NCR 53C810
    bus 1, slot 2 -- vga -- ISA VGA
    bus 0, slot 6 -- pua -- CIPCA
    bus 0, slot 7 -- pub -- CIPCA
    bus 0, slot 8 -- fwa -- DEC PCI FDDI
    Memory Testing and Configuration Status
    Module   Size    Base Addr   Intlv Mode  Intlv Unit  Status
    ------   -----   ---------   ----------  ----------  ------
    0      128MB   00000000      2-Way         0       Passed
    1      128MB   00000000      2-Way         1       Passed
    Total Bad Pages 0
    Testing the System
    Testing the Disks (read only)
    Testing the Network
    AlphaServer 2100 Console V4.7-143, built on Nov 20 1996 at
    
    
    P00>>>test
    Testing the Memory
    
    Now the system hangs (uninterruptable)
    
    Any hints what could cause this?
    Any ideas what to check?
    
    Thanks
    Raymond.
T.RTitleUserPersonal
Name
DateLines
2561.1heap_expand?AFW3::MAZURTue Mar 25 1997 00:53120
Raymond,

  Is this a new install?  If not, what is different about this system?
Has the configuration changed?  Did this system boot in the past?
Did you add memory?  Or CIPCAs?  

  It appears as if the memory where the boot block was loaded is
corrupted, broken, or misaddressed.   The code at virtual memory
20000000 was a zero.  And from my log below compared to yours, it
looks as if the page table address is in error:

  Your system:
    initializing page table at fff0000

  My system:
    initializing page table at 1f2000

I am going to take a guess that you need to increase the heap_expand
environment variable.


  Try this, and report the results:

  >>> show heap_expand
  >>> b dua2243.0.0.6.0 -flags 3,0 -h
  >>> e pmem:fff0000 -n f -l
  >>> e vmem:20000000 -n f -d
  >>> e pmem:200000 -n f -d
  >>> dyn
  >>> set heap_expand 128K
  >>> init
  >>> show heap_expand
  >>> b dua2243.0.0.6.0 -flags 3,0 -h
  >>> e pmem:fff0000 -n f -l
  >>> e vmem:20000000 -n f -d
  >>> e pmem:200000 -n f -d
  >>> dyn
  >>> cont


The results should look something like:

P00>>>show heap_expand
heap_expand             NONE            
P00>>>b -h dka0
(boot dka0.0.0.1.0 -flags 0)
block 0 of dka0.0.0.1.0 is a valid boot block
reading 16 blocks from dka0.0.0.1.0
bootstrap code read in
base = 200000, image_start = 0, image_bytes = 2000
initializing HWRPB at 2000
initializing page table at 1f2000
initializing machine state
setting affinity to the primary CPU
P00>>>e pmem:1f2000 -n f -l
pmem:           1F2000 00001101 
pmem:           1F2004 000000FA 
pmem:           1F2008 00001101 
pmem:           1F200C 000000F9 
pmem:           1F2010 00000000 
pmem:           1F2014 00000000 
pmem:           1F2018 00000000 
pmem:           1F201C 00000000 
pmem:           1F2020 00000000 
pmem:           1F2024 00000000 
pmem:           1F2028 00000000 
pmem:           1F202C 00000000 
pmem:           1F2030 00000000 
pmem:           1F2034 00000000 
pmem:           1F2038 00000000 
pmem:           1F203C 00000000 
P00>>>e vmem:20000000 -d -n f
vmem:         20000000 47FF041F NOP         
vmem:         20000004 C0200002 BR          R1,000002
vmem:         20000008 200099A0 LDA         R0,-6660(R0)
vmem:         2000000C 00000000 HALT        
vmem:         20000010 A7A10000 LDQ         R29,(R1)
vmem:         20000014 D3400007 BSR         R26,000007
vmem:         20000018 43C4153E SUBQ        SP,#20,SP
vmem:         2000001C 47FF0410 CLR         R16
vmem:         20000020 B7FE0018 STQ         R31,0018(SP)
vmem:         20000024 47FF0411 CLR         R17
vmem:         20000028 47FF0412 CLR         R18
vmem:         2000002C D3400110 BSR         R26,000110
vmem:         20000030 00000000 HALT        
vmem:         20000034 243D0000 LDAH        R1,(R29)
vmem:         20000038 20218070 LDA         R1,-7F90(R1)
vmem:         2000003C 245D0000 LDAH        R2,(R29)
P00>>>e pmem:200000 -d -n f
pmem:           200000 47FF041F NOP         
pmem:           200004 C0200002 BR          R1,000002
pmem:           200008 200099A0 LDA         R0,-6660(R0)
pmem:           20000C 00000000 HALT        
pmem:           200010 A7A10000 LDQ         R29,(R1)
pmem:           200014 D3400007 BSR         R26,000007
pmem:           200018 43C4153E SUBQ        SP,#20,SP
pmem:           20001C 47FF0410 CLR         R16
pmem:           200020 B7FE0018 STQ         R31,0018(SP)
pmem:           200024 47FF0411 CLR         R17
pmem:           200028 47FF0412 CLR         R18
pmem:           20002C D3400110 BSR         R26,000110
pmem:           200030 00000000 HALT        
pmem:           200034 243D0000 LDAH        R1,(R29)
pmem:           200038 20218070 LDA         R1,-7F90(R1)
pmem:           20003C 245D0000 LDAH        R2,(R29)
P00>>>dyn
zone     zone       used    used       free    free       utili-  high
address  size       blocks  bytes      blocks  bytes      zation  water
-------- ---------- ------- ---------- ------- ---------- ------- ----------
000309A0 849088     375     306336     29      542784      36 %   407584    
P00>>>









2561.2AFW3::MAZURTue Mar 25 1997 13:0421
>
>  Your system:
>    initializing page table at fff0000
>
>  My system:
>    initializing page table at 1f2000
>


Correcting myself, there is no problem here.  Recent console changes
have move the page table into high memory, and that is reflected
correctly in the boot display you have.

Still try the increasing heap_expand.   I think the failure mode would
have been different if there was a heap problem, but I cannot say that
for sure.





2561.3writebootTIMABS::FREPPELMosquito ergo summm...Wed Mar 26 1997 11:4655
    Hi Dennis,

    thanks for the quick answer. The system is booted now.

    After discarding a couple of theories we ended up doubting the good 
    shape of the systemdisk we were attempting to boot from (a shadow set 
    member - we currently have a whole lot of trouble with shadowing in this 
    cluster, therefore the doubts seem reasonable).

    What we actually did was:
    - mount the system disk as a data disk on a different system
    - do a WRITEBOOT to it
    - dismount the disk
    - boot the sable
    And - the system booted.

    So, did the system lie to me when it said ------------+
     P00>>>b -fl 3,1                                      |
     (boot dua2243.0.0.6.0 -flags 3,1)                    |
     block 0 of dua2243.0.0.6.0 is a valid boot block <---+
     reading 1013 blocks from dua2243.0.0.6.0
     bootstrap code read in
     base = 200000, image_start = 0, image_bytes = 7ea00
     initializing HWRPB at 2000
     initializing page table at fff0000
     initializing machine state
     setting affinity to the primary CPU
     jumping to bootstrap code

     halted CPU 0

     halt code = 5
     HALT instruction executed
     PC = 20000000

    Thanks for your help.
    Raymond.

    PS: We also did the heap_expand (but to no avail):
    	P00>>>sho heap*
    	heap_expand            		NONE
    	P00>>>set heap_expand 		1024
    	P00>>>sho heap*
    	heap_expand            		1024K


     In the display of a "cat el" command we saw:
     
     	Starting Memory Diagnostics

     	***Error - Corrupt IIC error log on module 1
   	Resetting the error log number to zero
     
     Could this cause any trouble?
    
2561.4CLOUD::SHIRRONStephen F. Shirron, 223-3198Wed Mar 26 1997 14:056
The console prints "... is a valid boot block" if the boot block (LBN 0) passes
several validity tests (is the checksum okay, are certain reserved fields zero
like they should be, etc.).  The image that the boot block points to could be
corrupted, and the console would not know.

stephen
2561.5AFW3::MAZURWed Mar 26 1997 14:0910
>     	***Error - Corrupt IIC error log on module 1
>   	Resetting the error log number to zero
>     
>     Could this cause any trouble?
>

I do not know the answer to this.  I have done more TurboLaser and am
unfamiliar with the error modes on this platform.   It does look like
it was self correcting though.

2561.6TIMABS::FREPPELMosquito ergo summm...Thu Mar 27 1997 06:266
    re .4 & .5:
    
    Thanks a lot for your fast clarifications.
    I appreciate your help.
    
    Raymond.