[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference mvblab::sable

Title:SABLE SYSTEM PUBLIC DISCUSSION
Moderator:COSMIC::PETERSON
Created:Mon Jan 11 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2614
Total number of notes:10244

2604.0. "What if CPU 0 fails?" by ALFAM7::STREPPEL () Thu May 22 1997 14:51

    I was trying to explain to a customer what happens if the os - here
    Digital UNIX - finds a broken CPU during runtime. Before it panics it
    masks the failed CPU so that during reboot the failed CPU is excluded.
    Now I tried to simulate the behaviour that CPU had failed and wanted to
    set cpu_enabled to 2 in order to exclude CPU 0. We tried this first on
    a 4100, but the console refused to disable CPU. On a 2000 we could even
    set cpu_enabled to 0, but VMS saw  CPU 0 anyway. I had no time yet to 
    install DU on that system.
    
    Can someone explain what happens if the primary CPU fails? And can I
    change the primary CPU from the console, i.e. exclude CPU 0? Or do I
    have to swap CPU boards?
    
    The 8x00 have a cpu_primary environment variable - it would be
    interesting to understand why they hav it and the smaller ones don't.
    
    	Regards
    		Hartmut
                         
T.RTitleUserPersonal
Name
DateLines
2604.1CLOUD::SHIRRONStephen F. Shirron, 223-3198Thu May 22 1997 20:325
The SROM code, which runs before the SRM console code, is the only entity
capable of disabling CPU 0 and selecting a different CPU to be the primary.
Using cpu_enabled will NOT cause a new primary to be selected.

stephen
2604.2Two cents re: 4100/4000 platform and CPU_ENABLEDHARMNY::CUMMINSFri May 23 1997 22:0831
    The 4100 platform supports up to four CPUs. Any CPU except CPU0 can be
    disabled. The CPU_ENABLED EV is used to perform this function. At
    power-up, each CPU in the system is told to start the SRM console, but
    only if enabled via CPU_ENABLED.
    
    We were requested by the operating system groups to not allow disabling
    CPU0. Technically, we had the option of only allowing it to be disabled
    if other CPUs were present/okay in the machine. Still, the possibility
    existed, that were we to allow CPU0 disables, a faulty CPU elsewhere in
    the system would preclude the system from coming up. In the end, the OS
    groups wanted no part of disabling CPU0, so we complied. It should be
    noted that the 4100/4000 system cannot operate without a CPU0, since it
    provides the oscillator for the system bus.
    
    IMHO, CPU_DISABLED is provided for two reasons:
    
     1) To disable suspect (faulty HW) CPUs. 
        
        We typically provide excellent fault coverage during power-up and
        auto-disable CPUs if we detect a fault anyway.
    
        CPUs 1,2,3 can be disabled on 4100.
    
     2) To enable performance comparisons on SMP machines without requiring
        HW to be physically removed from the machine.
    
        Thus, one can measure performance on a quad-CPU 4100 and compare
        against performance on a triple, a dual, and a uni, simply by
        adjusting the CPU_ENABELD EV and rebooting the OS.
    
        This can be done on 4100.
2604.3CPU0 limitation is a step backwardSTAR::jacobi.zko.dec.com::jacobiPaul A. Jacobi - OpenVMS Systems GroupTue May 27 1997 17:2810
>>> It should be noted that the 4100/4000 system cannot operate without a CPU0, 
>>> since it provides the oscillator for the system bus.

Please consider removing this design limitation on future systems.  The requirement
for CPU0 to be present is a step backward in terms of CPU fail-over functionality 
that exits on Sable and even on old VAX6000 systems.


							-Paul

2604.4MAY30::CUMMINSWed May 28 1997 13:083
    CPU0 needs only be present (and with working system bus oscillator).
    Most of the CPU (cache, EV5, SROM, etc.) can be terribly broken, and
    console and the O/S should still come up on an SMP 4100/4000 machine.