| Raymond,
Is this a new install? If not, what is different about this system?
Has the configuration changed? Did this system boot in the past?
Did you add memory? Or CIPCAs?
It appears as if the memory where the boot block was loaded is
corrupted, broken, or misaddressed. The code at virtual memory
20000000 was a zero. And from my log below compared to yours, it
looks as if the page table address is in error:
Your system:
initializing page table at fff0000
My system:
initializing page table at 1f2000
I am going to take a guess that you need to increase the heap_expand
environment variable.
Try this, and report the results:
>>> show heap_expand
>>> b dua2243.0.0.6.0 -flags 3,0 -h
>>> e pmem:fff0000 -n f -l
>>> e vmem:20000000 -n f -d
>>> e pmem:200000 -n f -d
>>> dyn
>>> set heap_expand 128K
>>> init
>>> show heap_expand
>>> b dua2243.0.0.6.0 -flags 3,0 -h
>>> e pmem:fff0000 -n f -l
>>> e vmem:20000000 -n f -d
>>> e pmem:200000 -n f -d
>>> dyn
>>> cont
The results should look something like:
P00>>>show heap_expand
heap_expand NONE
P00>>>b -h dka0
(boot dka0.0.0.1.0 -flags 0)
block 0 of dka0.0.0.1.0 is a valid boot block
reading 16 blocks from dka0.0.0.1.0
bootstrap code read in
base = 200000, image_start = 0, image_bytes = 2000
initializing HWRPB at 2000
initializing page table at 1f2000
initializing machine state
setting affinity to the primary CPU
P00>>>e pmem:1f2000 -n f -l
pmem: 1F2000 00001101
pmem: 1F2004 000000FA
pmem: 1F2008 00001101
pmem: 1F200C 000000F9
pmem: 1F2010 00000000
pmem: 1F2014 00000000
pmem: 1F2018 00000000
pmem: 1F201C 00000000
pmem: 1F2020 00000000
pmem: 1F2024 00000000
pmem: 1F2028 00000000
pmem: 1F202C 00000000
pmem: 1F2030 00000000
pmem: 1F2034 00000000
pmem: 1F2038 00000000
pmem: 1F203C 00000000
P00>>>e vmem:20000000 -d -n f
vmem: 20000000 47FF041F NOP
vmem: 20000004 C0200002 BR R1,000002
vmem: 20000008 200099A0 LDA R0,-6660(R0)
vmem: 2000000C 00000000 HALT
vmem: 20000010 A7A10000 LDQ R29,(R1)
vmem: 20000014 D3400007 BSR R26,000007
vmem: 20000018 43C4153E SUBQ SP,#20,SP
vmem: 2000001C 47FF0410 CLR R16
vmem: 20000020 B7FE0018 STQ R31,0018(SP)
vmem: 20000024 47FF0411 CLR R17
vmem: 20000028 47FF0412 CLR R18
vmem: 2000002C D3400110 BSR R26,000110
vmem: 20000030 00000000 HALT
vmem: 20000034 243D0000 LDAH R1,(R29)
vmem: 20000038 20218070 LDA R1,-7F90(R1)
vmem: 2000003C 245D0000 LDAH R2,(R29)
P00>>>e pmem:200000 -d -n f
pmem: 200000 47FF041F NOP
pmem: 200004 C0200002 BR R1,000002
pmem: 200008 200099A0 LDA R0,-6660(R0)
pmem: 20000C 00000000 HALT
pmem: 200010 A7A10000 LDQ R29,(R1)
pmem: 200014 D3400007 BSR R26,000007
pmem: 200018 43C4153E SUBQ SP,#20,SP
pmem: 20001C 47FF0410 CLR R16
pmem: 200020 B7FE0018 STQ R31,0018(SP)
pmem: 200024 47FF0411 CLR R17
pmem: 200028 47FF0412 CLR R18
pmem: 20002C D3400110 BSR R26,000110
pmem: 200030 00000000 HALT
pmem: 200034 243D0000 LDAH R1,(R29)
pmem: 200038 20218070 LDA R1,-7F90(R1)
pmem: 20003C 245D0000 LDAH R2,(R29)
P00>>>dyn
zone zone used used free free utili- high
address size blocks bytes blocks bytes zation water
-------- ---------- ------- ---------- ------- ---------- ------- ----------
000309A0 849088 375 306336 29 542784 36 % 407584
P00>>>
|
| >
> Your system:
> initializing page table at fff0000
>
> My system:
> initializing page table at 1f2000
>
Correcting myself, there is no problem here. Recent console changes
have move the page table into high memory, and that is reflected
correctly in the boot display you have.
Still try the increasing heap_expand. I think the failure mode would
have been different if there was a heap problem, but I cannot say that
for sure.
|
| Hi Dennis,
thanks for the quick answer. The system is booted now.
After discarding a couple of theories we ended up doubting the good
shape of the systemdisk we were attempting to boot from (a shadow set
member - we currently have a whole lot of trouble with shadowing in this
cluster, therefore the doubts seem reasonable).
What we actually did was:
- mount the system disk as a data disk on a different system
- do a WRITEBOOT to it
- dismount the disk
- boot the sable
And - the system booted.
So, did the system lie to me when it said ------------+
P00>>>b -fl 3,1 |
(boot dua2243.0.0.6.0 -flags 3,1) |
block 0 of dua2243.0.0.6.0 is a valid boot block <---+
reading 1013 blocks from dua2243.0.0.6.0
bootstrap code read in
base = 200000, image_start = 0, image_bytes = 7ea00
initializing HWRPB at 2000
initializing page table at fff0000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code
halted CPU 0
halt code = 5
HALT instruction executed
PC = 20000000
Thanks for your help.
Raymond.
PS: We also did the heap_expand (but to no avail):
P00>>>sho heap*
heap_expand NONE
P00>>>set heap_expand 1024
P00>>>sho heap*
heap_expand 1024K
In the display of a "cat el" command we saw:
Starting Memory Diagnostics
***Error - Corrupt IIC error log on module 1
Resetting the error log number to zero
Could this cause any trouble?
|
| The console prints "... is a valid boot block" if the boot block (LBN 0) passes
several validity tests (is the checksum okay, are certain reserved fields zero
like they should be, etc.). The image that the boot block points to could be
corrupted, and the console would not know.
stephen
|