[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference nlfdc::linux-users

Title:Linux, the Free Operating System
Notice:New here? Sign in on topic 2
Moderator:EST::DEEGAN
Created:Fri Feb 11 1994
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:609
Total number of notes:2862

579.0. "FYI: PCI EIDE Controller Flaws Discovered" by NEWVAX::PAVLICEK (Linux: the PC O/S that isn't PC) Tue Feb 25 1997 08:59

             <<< NOTED::NOTES$10:[NOTES$LIBRARY]IBMPC-95.NOTE;1 >>>
                        -< IBM PCs, clones, DOS, etc. >-
================================================================================
Note 2796.13        Large harddisks and BIOS support question           13 of 13
ODIXIE::SIMPSONT "PC = world's biggest con job!"    263 lines  24-FEB-1997 19:48
                         -< additional consideration >-
--------------------------------------------------------------------------------
    Just to stir things up a little...here is something else you may want to 
    check into further before plunking down your hard-earned cash for a big
    EIDE disk drive...
    
    
    PCI EIDE Controller Flaws Discovered
    
    
    BY ROEDY GREEN 
    
    
    
    Introduction
    
    There are serious flaws affecting about one-third of all PCI
    motherboards. The flaws affect any motherboard or EIDE controller
    paddleboard containing the PC-Tech RZ-1000 PCI EIDE controller chip or
    the CMD PCIO 640 PCI EIDE controller chip.
    
    The flaws affect motherboards from ASUSTeK, AT&T, DEC, Dell, Gateway,
    Intel, Micron, NEC, Zeos and others. Since Intel makes so many of the
    motherboards sold under other brand names, the flaws affect many
    machines, both 486 and Pentium PCI.
    
    The flaws show up most frequently when you run a true multitasking
    operating system such as OS/2 Warp or NT. They also show up under
    Windows For WorkGroups in 32-bit mode during tape or floppy backup and
    restore. In theory, the flaws could do damage under DOS, DESQview,
    Windows and Windows For WorkGroups in 16-bit mode, but so far there
    have been no damage reports. Windows-95 contains code to bypass the
    flaws.
    
    The RZ-1000 has two flaws. The CMD-640 has those same two flaws, plus
    three others. To make matters worse, most motherboard manufacturers
    using these two flawed chips connected them up incorrectly. There are
    software bypasses for these flaws. However, the Warp fix for the
    CMD-640 reduces performance by 50 percent.
    
    What are the symptoms? 
    
    When you are using an IDE or EIDE hard disk attached to the EIDE
    motherboard port, the flaws subtly corrupt your files by randomly
    changing bytes every once in a while. The flaws introduce bugs into EXE
    files, subtle errors into your spreadsheets, stray characters into your
    word processing documents, changes to the deductions in last year's tax
    return files, and random changes to engineering design files.
    
    This corruption happens when you are simultaneously using your EIDE or
    IDE hard disk and some other device, most commonly the floppy drive or
    mag tape backup. The same sort of problem may occur on reading a CD-ROM
    drive attached to an EIDE port.
    
    Unfortunately, correcting the problem just stops further file
    corruption. It will not help to clean up the existing damage to your
    files. Right now, the focus is on bypassing the flaws. Preventing
    further corruption is child's play compared with the nightmare of
    trying to track down all the existing random errors in files. Backups,
    even from day one, may be corrupt. If you have either of the flawed
    chips, you will probably never be able to completely eliminate the
    effects of past corruption.
    
    
    Testing For The Flaws 
    
    I wrote two test programs that run under DESQview, Windows, Windows For
    WorkGroups, Windows 95, NT and OS/2. EIDEtest verifies that your hard
    disk is working properly, and CDtest verifies your CD-ROM. If these
    tests fail, it proves you have a serious problem, but not necessarily
    that you have the RZ-1000 or CMD-640 chip.
    
    If the tests pass, you still may have a problem since, especially under
    DOS, DESQview and Windows, the flaws may only show up rarely. If you
    run the tests under Windows 95 they will always pass, even if you have
    the defective chip, because the operating system already bypasses the
    flaws.
    
    What Can You Do If You Have A Flaw? 
    
    Pester the manufacturer. Unfortunately, the EIDE controller chips are
    soldered in. The only way to repair a flaw is to replace the whole
    motherboard, recycling the socketed chips: the CPU, DRAM and SRAM
    cache. It would be very expensive for computer and motherboard
    manufacturers to fix a flaw.
    
    Buy a new, unpopulated Triton PCI motherboard and recycle the CPU, DRAM
    and SRAM cache chips from the old motherboard.
    
    Run the controller in degraded mode. Some BIOSes have a feature to
    disable the EIDE prefetch buffer. Vendors may offer a BIOS upgrade to
    allow you to manually disable prefetch. The BIOS may also turn it off
    automatically if either of the defective chips is present. This will
    bypass both RZ-1000 flaws and two of the five CMD-640 flaws.
    
    Buy a PCI EIDE paddleboard controller, such as the Promise 2300+ or the
    BusLogic BT-910, to replace the one on the motherboard. You must
    disable the EIDE controller on the motherboard. This fix will waste one
    of your precious slots. Be careful. You could be leaping out of the
    RZ-1000 frying pan into the CMD-640 fire, since paddleboards often use
    the CMD-640.
    
    Buy a SCSI hard disk and CD-ROM, and avoid using the EIDE ports
    entirely. Under OS/2 and Linux, SCSI gives better performance, but
    costs more. DOS, Windows, Windows For WorkGroups and Windows 95 are
    unable to exploit the advanced features of SCSI, but at least avoid the
    EIDE flaws when you go to pure SCSI.
    
    Find a software work-around. There are fixes for Warp to bypass all the
    flaws in the RZ-1000 and CMD-640. Fixpack 5 and pre-release Fixpack 9
    do not bypass the flaws. Now that Intel and IBM have revealed the
    technical details, all the operating system writers can patch their
    EIDE drivers to bypass the flaws. There are also fixes for NT 3.1 and
    3.5.
    
    Get a BIOS upgrade. For DOS, DESQview, and Windows 3.1, to bypass the
    flaws you may need a new BIOS: an EPROM chip. If you have a flash BIOS,
    you can update it simply by downloading a file. Most BIOSes already
    have code to bypass the flaws for DOS, DESQview and Windows. However,
    more advanced operating systems bypass the BIOS, so even a smart BIOS
    will not protect you. However, the BIOS CMOS settings may allow you to
    disable prefetch, which also protects you in even true multitasking
    operating systems.
    
    
    Cut the trace. Cut the trace on the motherboard from the floppy
    changeline to the EIDE controller. However, this only bypasses one of
    the CMD-640's five flaws and one of the RZ-1000's two flaws.
    
    Whatever method you use to bypass the flaws, retest with EIDEtest and
    CDTest afterwards to be sure your fix worked and you caught all the
    problems.
    
    Cleaning Up The Mess 
    
    Once you have bypassed the flaws, you can start working on the problem
    of cleaning up your files.
    
    The first thing to do is to re-install your operating system and all
    your application programs. This will replace any damaged EXE and DLL
    files.
    
    Catching errors in your data files is more difficult. Keep your eyes
    peeled for any improbable spreadsheet results. You may have to hire a
    programmer to write you some comb programs to sniff through your
    databases, looking for suspicious values.
    
    If you routinely use the verify feature of Lotus Magellan, it can
    detect changes to files that should not have changed. This may help you
    uncover some of the damage. The flaws are not polite enough to redate
    the files they corrupt. :-)
    
    If you have backups from before the time you bought the faulty machine,
    you can restore them and re-key everything.
    
    Most people will not be so fortunate. All their backups will also be
    corrupt.
    
    Most people with flaws will just have to put up with random errors
    dotting their data files ever after.
    
    What Are the Flaws? 
    
    IBM confirmed the RZ-100 has two different flaws:
    
    In prefetch mode, multi-sector reads often fail.
    
    The chip erroneously responds to floppy status commands and corrupts
    the hard disk or CD-ROM I/O in the process.
    
    IBM confirmed the CMD-640 has five different flaws. It has the same
    prefetch problem as the RZ-1000. It has the same floppy status problem
    as the RZ-1000. It does not support simultaneous I/O on the primary and
    secondary EIDE ports. There is confusion over legacy and PCI mode.
    Finally, it does not support 32-bit writes.
    
    Test Programs 
    
    When requesting files on the Internet,you must generally use lower
    case.
    
    Below are the addresses for Roedy Green's EIDEtest and CDTest programs
    for DOS, DESQview, Windows, Windows For WorkGroups, Windows 95, NT,
    OS/2 and Warp. By the time you read this newer version, I will likely
    have posted newer versions. 
    
    ftp://garbo.uwasa.fi/pc/diskutil/
    
    ftp://ftp.cdrom.com/.4/os2/incoming/eidete16.zip
    
    Intel's RZ-1000 chip detect program:
    
    http://www.intel.com/procs/support/rz1000/rztest.exe
    
    Intel's CMD-640 and RZ-1000 chip detect program, coming soon:
    
    http://www.intel.com/procs/support/ctrltest/
    
    IOTest from PowerQuest, the makers of Partition Magic, a Warp test for
    the flaws.
    
    http://www.powerquest.com/download/iotest.zip
    
    Fixes
    
    Warp bypass for the RZ-1000 chip flaws:
    
    ftp://service.boulder.ibm.com/ps/products/os2/fixes/v3.0warp/english-us/pj19409/pj19409.zip
    
    Warp bypass for the CMD-640 chip flaws:
    
    ftp://ftpos2.cdrom.com/pub/os2/drivers/cmd640x.zip
    
    Microsoft Windows NT 3.1 ATDISK.SYS fix for the CMD-640 chip:
    
    http://www.microsoft.com/KB/softlib/mslfiles/pciatdsk.exe
    
    Microsoft Windows NT 3.5 fix for the CMD-640 chip:
    
    CMD's BBS at (714) 454-1134. 
    File 640XNT35.ZIP 
    
    Essays
    
    Roedy Green's FAQ (Frequently Asked Questions) a 19-page unabridged
    version of this article. 
    
    ftp://garbo.uwasa.fi/pc/diskutil/eidete16.zip
    ftp://ftp.cdrom.com/.4/os2/incoming/eidete16.zip
    
    PowerQuest essay:
    
    http://www.powerquest.com/
    
    Intel's FAQ
    
    http://www.intel.com/procs/support/rz1000
    
    PC-Tech's essay:
    
    http://www.mei.micron.com/rz1000/rz1000.txt
    
    Catch Pat Duffy's (duffy@theory.chem.ubc.ca) essays each Sunday in:
    
    comp.os.os2.misc, comp.os.os2.setup.misc, comp.os.os2.setup.storage and
    comp.sys.ibm.pc.hardware.misc 
    
    Check out Pat Duffy's Web site at: 
    
    http://warp.eecs.berkeley.edu/os2/workbench/work.htm
    ftp://ftp.netcom.com/pub/ab/abe/
    
    Roedy Green is a computer consultant who prefers to work on Forth, C++,
    Delphi, DOS, OS/2 and Internet Web projects. If you send $5 (US or
    Canadian) to cover duplication, postage, and handling, he will send you
    a diskette containing the relevant test programs, fixes, Internet
    postings and essays. Send email to: Roedy@bix.com or discuss this
    problem on the Internet newsgroup in: comp.os.os2.bugs.
    
    You can also write via snail mail:
    
    Roedy Green, Canadian Mind Products #601 - 1330 Burrard Street,
    Vancouver, BC CANADA V6Z 2B8 (604) 685-8412
    
    
T.RTitleUserPersonal
Name
DateLines
579.1MOVIES::TWEEDIETue Feb 25 1997 14:3016
For what it's worth, Linux has had software auto-detect of these buggy
chipsets, plus software workaround for the bugs, for ages --- over a year
now, I think.  Of course, you still lose performance because the workarounds
have got to disable some of the advanced features of EIDE, but I'm not aware
of any data-corruption problems with current Linux kernels using these
chipsets.

The recommendation that you replace a buggy rz1000 or cmd640 chipset board
with a newer Triton board is a good one.  Using the latest Triton chipsets,
Linux will allow you to perform EIDE transfers using DMA, giving you at 
least one of the performance advantages usually only available on SCSI 
controllers.  (Of course, SCSI is _still_ a good bit faster, since even
using DMA, EIDE doesn't let you ever have more than one outstanding IO 
command at a time per channel.)

 Stephen.
579.2DECWET::LOWEBruce Lowe, DECwest Eng., DTN 548-8910Sat Mar 08 1997 03:3517
Hmmm ... this would explain a problem I was having, which i was going to
ask about in here.

My ASUS mboard p90 had an old Soundblaster/CD setup, and sbpcd worked OK.
I installed a move recent soundcard with an ATAPI 12x CD drive, and tried
booting bare.i. It can't see the CD (I have two EIDE drives in IDE controller
0, and the CD on controller 1).

When I disconnect the 2nd hard drive and put the CD on controller 0, it can 
see it.

On booting, I see a message: 
	ide: buggy CDM640B interface on pci (0x80006800); serialized; 
        secondary port

So it's NOT the CD.