[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference wonder::turbolaser

Title:TurboLaser Notesfile - AlphaServer 8200 and 8400 systems
Notice:Welcome to WONDER::TURBOLASER in it's new homeshortly
Moderator:LANDO::DROBNER
Created:Tue Dec 20 1994
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1218
Total number of notes:4645

1115.0. "Unable to malloc memory for decompression for net" by NETRIX::"dixon@alf.dec.com" (Travis N. Dixon) Mon Feb 24 1997 14:42

Customer has a alphaserver 8400 8cpus 8gbs of memory.

When the cust. reboots system he recieves the error:,
"unable malloc memory for decompresssion net"
when he recieves the error the systems locks and the only way to unlock is 
to reboot.

Does any one have any insight on this problem.

Thanks,
	Travis


[Posted by WWW Notes gateway]
T.RTitleUserPersonal
Name
DateLines
1115.1boot_reset = ON to avoid problemAFW4::MAZURMon Feb 24 1997 23:3638
This has been seen by serveral customers and brought to the attention of
console engineering.  And it was registered as an IPMT and there is a
fix in the upcoming release of the console.  If you are really inconvenienced,
set the boot_reset environment variable to ON.

Here is the problem.  The console code has a memory leak and a fragmentation
problem.  The console has a limited amount of heap space, around 1MB to do its
work.  Memory leaks are chunks of memory that get allocated, but then get
lost from the console's bookkeeping.  Gradually the size of the console
heap gets smaller and smaller.   Fragmentation is when the usage of the
memory is spotty and peppered about the heap.  The longer a console lives
without being re-inited, the more fragmented the heap gets, and the 
largest contiguous free chunk gets smaller and smaller.  

The console pulls in overlays from the flash rom.  These overlays are
compressed.  The decompression algorithm requires a 140KB of contiguous
memory to get an overlay into memory.  That is why a shrunken heap problem
manifests itself mostly as a decompression error.

One of the leaks is known to be in the console's KZPSA driver.  After
many boots and show dev's, the console has been known to see this problem.
Customers with boot_reset ON are less likely to see a problem because
the the console memory is totally reinitialized every the boot, and the 
lost memory is recaptured and there are large contiguous chunks again.
Without the resets, as the heap ages and the losses accrue.

Leaks are difficult to track down, but the work is ongoing to resolve
them.  The console that is to be released soon, the console watches
a low-on-heap threshold.  If the problem is detected, the console will
reinitialize itself at boot time (equivalent to having boot_reset=ON)
resulting in a clean nicely reformed heap, avoiding the potential for
a decompression error, and allowing the boot to procede correctly.






1115.2CSC32::BLAYLOCKIf at first you doubt,doubt again.Fri Feb 28 1997 16:1343
With the new V4.8-6 console code, should we be getting multiple
messages of this type?  At least it does not hang anymore...

P00>>>b
(boot dkg100.1.0.4.3 -flags a)
SRM boot identifier: scsi 3 4 0 1 100 ef00 81011
boot adapter: kzpsa6  rev 0 in bus slot 4 off of kftha0 in TLSB slot 8
block 0 of dkg100.1.0.4.3 is a valid boot block
reading 16 blocks from dkg100.1.0.4.3
bootstrap code read in
base = 200000, image_start = 0, image_bytes = 2000
initializing HWRPB at 2000
initializing page table at 1f2000
initializing machine state
setting affinity to the primary CPU
Configuring I/O adapters...
tulip0, slot 0, bus 0, hose2
Unable to malloc memory for Decompression for NET
kzpsa0, slot 2, bus 0, hose2
kzpsa1, slot 4, bus 0, hose2
kzpsa2, slot 8, bus 0, hose2
kzpsa3, slot 9, bus 0, hose2
kzpsa4, slot 10, bus 0, hose2
kzpsa5, slot 11, bus 0, hose2
tulip1, slot 12, bus 0, hose3
Unable to malloc memory for Decompression for NET
floppy0, slot 0, bus 1, hose3
kzpsa6, slot 4, bus 0, hose3
kzpsa7, slot 8, bus 0, hose3
kzpsa8, slot 9, bus 0, hose3
kzpsa9, slot 10, bus 0, hose3
pfi0, slot 11, bus 0, hose3
jumping to bootstrap code

Digital UNIX boot - Mon Aug 19 21:02:26 EDT 1996

Loading vmunix ...
Loading at fffffc0000230000
Current PAL Revision <0x10000400010113>
Switching to OSF PALcode Succeeded
New PAL Revision <0x10000400020115>

1115.3AFW3::MAZURMon Mar 03 1997 00:422
This is still not good, but at least you are in business.

1115.4CSC64::BLAYLOCKIf at first you doubt,doubt again.Mon Mar 03 1997 14:146
So is this expected or a new side effect?

I realize that we are still able to boot (a good thing ;-)
but with the error messages continuing, my customer will
ask about the viability of the fix.
1115.5Work is ongoingMAY30::AMATOBob AmatoTue Mar 04 1997 13:0112
    
    
    Hello,
    
    As stated in .1, work on this problem is ongoing.
    
    Does the problem in .2 occur after an init? Or after
    several reboot/shutdown cycles?  The output from
    a "show config" on this system would be helpful.
    
    Thanks,
    Bob
1115.6CSC64::BLAYLOCKIf at first you doubt,doubt again.Wed Mar 12 1997 19:4769
Sorry about taking so long to get to you.

The problem occurs after a number shutdown/reboot cycles
shutdown -r brings it out more than anything else.

Here is the show config on the 8200 system that we have.
The unknown include some FORE systems ATM cards (2) and
some PT334 (SS7) cards.

P08>>>show config

        Name                  Type   Rev  Mnemonic  
  TLSB
  4++   KN7CC-AB              8014  0000  kn7cc-ab0   
  5++   KN7CC-AB              8014  0000  kn7cc-ab1   
  7+    MS7CC                 5000  0000  ms7cc0      
  8+    KFTHA                 2000  0D02  kftha0      

  C0 PCI connected to kftha0              pci0    
  0+    DECchip 21041-AA    141011  0011  tulip0      
  1+    KZPSA                81011  0000  kzpsa0      
  2+    KZPSA                81011  0000  kzpsa1      
  4+    KZPSA                81011  0000  kzpsa2      
  5+    ?????              3341214  0010  unknown0    
  6+    ?????              3001127  0000  unknown1    
  7+    KZPSA                81011  0000  kzpsa3      
  8+    KZPSA                81011  0000  kzpsa4      
  9+    KZPSA                81011  0000  kzpsa5      
  A+    KZPSA                81011  0000  kzpsa6      

  C1 PCI connected to kftha0              pci1    
  0+    SIO                4828086  0005  sio0        
  4+    KZPSA                81011  0000  kzpsa7      
  5+    ?????              3341214  0010  unknown2    
  6+    ?????              3001127  0000  unknown3    
  B+    DEC PCI FDDI         F1011  0000  pfi0        

    Controllers on SIO                    sio0    
  0+    DECchip 21040-AA     21011  0024  tulip1      
  1+    FLOPPY                   2  0000  floppy0     
  2+    KBD                      3  0000  kbd0        
  3+    MOUSE                    4  0000  mouse0      

    EISA connected to pci1 through sio0   eisa0   

P08>>>
kzpsa1, slot 2, bus 0
kzpsa2, slot 4, bus 0
kzpsa3, slot 7, bus 0
kzpsa4, slot 8, bus 0
kzpsa5, slot 9, bus 0
kzpsa6, slot 10, bus 0
tulip1, slot 12, bus 0
floppy0, slot 0, bus 1
kzpsa7, slot 4, bus 0
pfi0, slot 11, bus 0
jumping to bootstrap code

Digital UNIX boot - Mon Aug 19 21:02:26 EDT 1996

Loading vmunix ...
Loading at fffffc0000230000
Current PAL Revision <0x10000500010112>
Switching to OSF PALcode Succeeded
New PAL Revision <0x10000300020115>

Sizes:
text = 3185632
data = 456080
1115.7AFW3::MAZURThu Mar 13 1997 10:565
We have identified the problem and are working on the solution. 
The problem is multiplied by the number of KZPSAs.  The best work
around right now is to have boot_reset ON.  


1115.8Another problemNNTPD::&quot;leitao@mail.dec.com&quot;Carlos LeitaoFri Apr 18 1997 10:5243
Hi
Because we had the problem mentioned on this note , as soon as we
receive the new version we installed it.
At this moment the new message hapens after several reboots and it is:

CPU 0 booting

Innsufficient Heap for overlay decompression.
System will be reset prior to boot.

halted CPU 1
CPU 2 is not halted
CPU 3 is not halted
CPU 4 is not halted
CPU 5 is not halted
CPU 6 is not halted
CPU 7 is not halted

halt code = 1
operator initiated halt
PC = fffffc000039e53c

..........................
The same for all CPU's
..........................

halted CPU 5

operator initiated halt
PC = fffffc0000264560
insufficient dynamic memory for a request of 40960 bytes
   PID       bytes  name
-------- ---------- ----
00000000      35424 ????
00000001      40672 idle
00000002        800 dead_eater
00000003        800 poll
00000004        800 timer

Thanks for your help

Carlos
[Posted by WWW Notes gateway]
1115.9AFW3::MAZURFri Apr 18 1997 13:4513
>Hi
>Because we had the problem mentioned on this note , as soon as we
>receive the new version we installed it.
>At this moment the new message hapens after several reboots and it is:
>
>CPU 0 booting
>
>Innsufficient Heap for overlay decompression.
>System will be reset prior to boot.


Was boot_reset ON?
1115.10replyNNTPD::&quot;leitao@mail.dec.com&quot;carlos leitaoWed Apr 23 1997 13:477
Hi

Yes the console variable boot_reset is ON

CL

[Posted by WWW Notes gateway]
1115.11MAY30::MAZURThu Apr 24 1997 12:4119
The cause of this problem has been fixed.   All the leaks in the KZPSA
driver have been plugged.

Until the next release, you can avoid this problem by rebooting your
system by a 2 step process. 

Instead of:
	
	$ REBOOT

use

	$ SHUTDOWN
	>>> b


The command line boot will recognize that boot_reset is ON and reinitialize
the system and clean up from the previous leaks.

1115.12Any date ?NNTPD::&quot;leitao@mail.dec.com&quot;Carlos LeitaoThu Apr 24 1997 16:3110
Hi
can you tell me when it's going to be available the next release ?
The two step method , doesn't work on that particular configuration.
They need to a reboot.

Thanks for your help 
Best Regards
Carlos Leitao

[Posted by WWW Notes gateway]
1115.13HARMNY::MAZURThu Apr 24 1997 20:172
The new V4.0 Firmware CD ship date is 7/2/97.