[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference nyoss1::market_investing

Title:Market Investing
Moderator:2155::michaud
Created:Thu Jan 23 1992
Last Modified:Thu Jun 05 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1060
Total number of notes:10477

70.0. "Kendall Square Research" by DECEAT::SHAH () Tue Feb 18 1992 21:00

    
    There was an article about Kendall Square Research in the Globe this
    weekend. They intend to go public and offer the stock in the $9-$11
    price range.
    
    Can someone provide some more information than the Globe ? In general,
    how does one go about finding some details about the *NEW* offerings ?
    
    Thanks,
    /Alkesh
    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    The article -
    Kendall Square Research - Supercomputer maker plans to tap the stock
    market {The Boston Globe, 15-Feb-92, p. 23} 
    
    Kendall Square Research Corp., a supercomputer maker in Waltham that
    has raised $63 million in private capital since 1986 said yesterday it
    plans to tap the stock market for up to $33 million. The company hopes
    to go public with 3 million shares at an estimated price of $9 to $11
    each. The offering, which requires Securities and Exchange Commission
    approval, would leave about 30% of its outstanding shares in public
    hands. The company's SEC registration shows tat William I. Koch, a
    Boston investor who recently relocated to San Diego to be closer to the
    development of a new high-technology sailboat for the America's Cup
    race, is its major investor. He owns nearly 58% of the company's
    shares, a stake that would be diluted to 40.5% after the initial public
    offering. Other major investors include the Palmer Organization
    Partnerships of Woburn, which would own nearly 6% of the shares after
    the deal; Olivetti Holdings NV, the venture investment arm of the
    Italian computer conglomerate; Sprout Funds; a venture firm; and John
    Hancock Venture Capital Fund. After Koch, Kendall's largest individual
    investor is Henry Burkhardt 3d, who cofounded the company and is its
    president and chief executive. He would own 3.7% of the stock after the
    offering. Burkhardt is best known as co-founder of Data General Corp.
    and Encore Computer. Another early investor is C. Gordon Bell. None of
    the initial investors plans to sell their shares in the offering. A
    Kendall spokeswoman said initial shipments of its first systems, known
    as the KSR1, were made to the Oak Ridge National Laboratory, Cornell
    University and Manchester University in England late last year. The
    company makes massively parallel supercomputers. The company has been
    involved in some controversy based on a recent article in Upside
    Magazine by journalist George Guilder of Tryingham. In praising the
    design of the KSR1, Guilder was critical of KSR's Bay State rival,
    Thinking Machines Corp. of Cambridge. Guilder said Thinking Machines'
    newest computers, the CM-4 employs a less effective massively parallel
    design. Thinking Machines founder Danny Hills sharply disputes
    Guilder's claims, saying the author's conclusion is based on incorrect
    information. Kidder Peabody & Co. is lead underwriter for Kendall
    Square Research's stock offering.
                                          
    
T.RTitleUserPersonal
Name
DateLines
70.1Some technical info on KSRTPSYS::ABBOTTRobert AbbottMon Feb 24 1992 14:35340
From USENET:

Article: 27192
Path: 
ryn.mro4.dec.com!nntpd.lkg.dec.com!news.crl.dec.com!deccrl!decwrl!uunet!ksr!dean
From: dean@ksr.com
Newsgroups: comp.arch
Subject: Announcing the KSR1 Supercomputer
Keywords: KSR Supercomputer
Message-ID: <10017@ksr.com>
Date: 22 Feb 92 00:14:16 GMT
Sender: news@ksr.com
Reply-To: ksr-info@ksr.com
Lines: 324
 
Our users, prospects and other interested parties have asked us to post
information on the net as a first step towards establishing a forum for 
sharing information on the KSR1.
 
In 1986 we took on the challenge of building a high performance system that
combined the price/performance advantages of multiple CMOS processors, a
traditional programming model that would allow users to develop, port and run
applications easily, and scalability.  This last includes scalability of
component technology and processor count within a given generation, and
technology scalability across future generations of systems.  The KSR1 is now
installed and running at customer sites.
 
We have not opened this forum until now because of a commitment to
under-promise and over-deliver.  So we won't be hyping the airwaves with "fast,
faster, fastest" rhetoric.  We'll just stick with facts.  When we deliver a
teraflop it will be both usable, affordable, and part of a successive
generation of products.  
 
If you don't find that to be a radical idea, we think you will read our
description of the KSR1 with interest and enthusiasm.  
 
Henry Burkhardt III
Chairman, President and Chief Executive Officer
Kendall Square Research
 
email: henry@ksr.com
 
 
 
KSR1 Computer System
 
The KSR1 is a highly parallel computer system  designed to be scalable to
thousands of processors while preserving the simplicity and familiarity of a
shared memory programming model.  Each processor is a RISC-style superscalar
64-bit unit operating at 20 MIPS and 40 MFLOPS (peak).  A KSR1 system contains
from eight to 1088  processors with a peak performance range from 320 to 43,520
MFLOPS , all sharing a common virtual address space of one million megabytes
(2**40 bytes).   
 
KSR1 Software
The KSR1 is the first general purpose computer with supercomputer performance
and workstation price performance.  KSR expects its user community to be
performing all the classical scientific calculations, all the typical business
functions (e.g., transaction processing, decision support), and all the typical
Unix functions (e.g., document preparation, mail) at the same time on the same
machine.  The design of the system is such that each community will get superb
performance and cost effective performance.  
 
KSR OS is an extension of OSF/1.  As such, it is a very complete implementation
of all of Unix.  KSR OS is fully compatible with BSD 4.3 which has no official
validation suite.  In addition, it will pass the validation suites for ATT SVr3
base and kernel extensions, X/Open XPG3, and POSIX.  
 
KSR OS does not use any front end machines and there is no distinguished
processor.  Thus, there are no OS bottlenecks and no reason to limit the
traditional Unix flexibility.  In particular, KSR OS supports an arbitrarily
large number of multi-threaded processes timesharing a large number of
processors.  This ability to timeshare is crucial in many interactive
applications, in which periods of intense computing are followed by human time
scale periods of thought.  Interactive applications spanning the entire range
between state of the art numeric processing guided by a user involving
scientific visualization all the way to traditional transaction processing as
practiced by banks and airlines are all efficiently and naturally supported in
the KSR OS environment.
 
The KSR1 environment is what a sophisticated Unix user would expect to find. 
At the user interface level, there is X11 and Motif.  At the language level,
there is  Fortran, with automatic parallelization,  C (both the ANSI and PCC
dialects), and  IBM-compatible COBOL.  At the database level there is the
ORACLE relational database management system (RDBMS), including application
development tools.  Kendall Square Research is extending ORACLE's features for
the parallel environment.  At the transaction processing level, there is AT&T's
Tuxedo /T and Tuxedo /D, fast non-relational file access methods, and fourth
generation languages.
 
For decision support applications, ORACLE for KSR1 will provide automatic
parallel processing for complex queries.  Kendall Square Research has developed
a general purpose technique called Query Decomposition which automatically
parallelizes SQL queries generated by ORACLE-based applications.  Future third
party RDBMS software ported to the KSR1 will also take advantage of the Query
Decomposition tool.
 
Parallel Fortran programming on the KSR1 can be fully-automatic, semi-automatic
or manual.  The parallel programming environment of the KSR1 is based on a
proprietary parallel run-time system (PRESTO), that dynamically executes
run-time decisions based on compiler-generated or programmer-specified
directives.  The functioning of the runtime system is one of the keys to KSR's
dramatically improved performance.  The system dynamically decides the level of
resources it will devote to a particular  parallel task at runtime based on the
amount of calculation required at this particular time  and the resources
available at that time rather than making a static decision about the resource
allocation question at compile time.  The result of this policy is that real
world problems that have significant variations in their processing
requirements can be run together taking advantage of all the cycles on the
machine rather than running them one at a time, wasting cycles in those parts
of the program that don't exhibit maximum parallelism.  
 
While offering a highly parallel applications development environment, Kendall
Square Research will make available in 1992 scientific and mathematical
subroutine libraries, and important third-party software packages for
computational fluid dynamics, quantum chemistry, mathematical algorithms for
engineering applications, molecular dynamic modeling for computational
chemistry, and finite element analysis for engineering applications.
 
KSR1 Networking
 
KSR1 supports an extensive set of connectivity technology including:
 
-  TCP/IP, NFS, DCE, SNA-3270, 3770, LU6.2/PU2.1, ISO/OSI X.25, X.29, X.28, X.3
protocols;
 
-  Ethernet, Token Ring,  HiPPI, and FDDI transports and;
 
-  Industry standard buses, the first of which is VME, to facilitate the
integration of third-party communication products.
 
 
ALLCACHE Memory
 
The KSR1's shared memory programming model is made possible by a new
architectural technique called ALLCACHE memory.  The KSR1 memory system is
designed to do for distributed memory what virtual memory did for hierarchical
memory -- it replaces the complexity and rigidity of the physical mechanism
with a uniform address space, now shared by a set of processors. System
hardware and software maps this space into physical devices. The KSR1 ALLCACHE
memory system, achieves this programming simplicity without sacrificing the
benefit of distributed memory --scalability -- its performance continues to be
good even as the number of processors grows very large.    
The memory models of today's highly parallel computer architectures raise
problems for programmers which are reminiscent of storage management in the
1960s.
Twenty-five years ago, storage management via overlay structures was an
integral part of the job of writing a program. Necessarily, programmers
attacked the task with a static analysis of the memory requirements of a single
program. Advances in programming practice and system architectures, however,
gradually rendered static storage management infeasible. The goals of machine
independence, re-use of modular program elements, and algorithms of high
complexity characterized by data structures of widely varying size and shape
were inconsistent with static, programmer controlled storage management. In
addition, the introduction of system environments in which computers were
organized for simultaneous use by several programs made it impossible for the
author of a single program to predict accurately the time-varying storage
requirements of the entire system.
Ultimately, these factors led to the adoption of virtual memory as a
near-universal feature of storage management in modern computer architectures.
Virtual memory makes storage management dynamic and largely automatic. It
permits programmers to write applications with a storage abstraction which is
simple and powerful -- a single uniform address space. System hardware and
software maps this space into physical devices.
Highly parallel computer architectures reprise these early storage management
issues with a new twist. All of the highly parallel systems that have been
introduced have distributed memories. That is, the physical memory comprises a
set of memory units, each connected to a unique processor. The processor-memory
pairs are interconnected by a network. Distributed memories have been universal
among highly parallel machines because they provide the only known means of
providing completely scalable access to memory -- that is, access whose
bandwidth increases in direct proportion to the the number of processors.
In most of today's parallel systems, the job of managing the movement of codes
and data among these distributed memory units belongs to the programmer. The
job is similar in style to the task of managing the migration of data back and
forth between primary and secondary storage prior to the introduction of
virtual memory, but it is much more complex. As before, programmers need to be
concerned about exactly what will fit where and what to remove to make room for
something new. Now however there are thousands of memory units to deal with
instead of just two or three.
ALLCACHE memory provides programmers with a uniform 2**40 byte (million
megabyte) address space for instructions and data. This space is called system
virtual address space or SVA space. The contents of SVA locations are
physically stored in a distributed fashion. ALLCACHE memory physically
comprises a set of memory arrays called local caches, each capable of storing
32MB. There is one local cache for each processor in the system. Hardware
mechanisms (the search engine described later) cause SVA addresses and their
contents to materialize in the local cache of a processor when the address is
referenced by that processor. The address and data remain at that local cache
until the space is required for something else.
As the name suggests, ALLCACHE memory behavior is like that of familiar caches:
data moves to the point of reference on demand. However, unlike the typical
cache architecture (which we might call SOMECACHE memory), the source for the
data which materializes in a local cache is not main memory but rather another
local cache. In fact, all of the memory in the machine consists of large,
communicating, local caches; the main memory of the machine is identical to the
collection of local caches.
The address and data that materialize in local cache A in response to a
reference by processor A may continue to reside simultaneously in other local
caches. Consistency is maintained by distinguishing the type of reference made
by processor A: a) If the data in the location will be modified by A, the local
cache will receive the one and only instance of an address and its data. b) If
the data will be read but not modified by A, the local cache will receive a
copy of the address and its data.
When processor A first references the address X, the ALLCACHE memory searches
that processor's local cache to see if the requested location is already stored
there. If not, a hardware search engine locates another local cache (say, local
cache B) where the address and data exist.
If the processor request being serviced is a read request (for example, to load
the value into a register) then the search engine will copy the address and
data from local cache B into local cache A. The amount of data copied will be
128 bytes, called a sub-page. At the end of this operation the sub-page will
reside at both A and B. If the processor request is a write request (for
example, to store the contents of a register into this location) then the
search engine will remove the copy of the sub-page from local cache B as well
as from any other local caches where it may exist before copying it into local
cache A. Thus the search engine is responsible for finding and copying
sub-pages stored in local caches and for maintaining consistency by eliminating
old copies when new contents are stored.
 
In order to maintain consistency, each local cache records state information
about the sub-pages it has stored. These states are specific to the physical
instance of a sub-page within a particular local cache. Thus a single sub-page
in SVA space may be in Invalid state in one local cache and in Copy state in
another. Some sub-page states are used and maintained exclusively by hardware
as part of the operation of the search engine. Others can be manipulated
indirectly by the operation of software.
 
There are times when two or more processors need to synchronize their access to
SVA locations. The ALLCACHE memory supports this requirement through
instructions which lock and unlock sub-pages. These instructions can be used to
implement any multi-processor synchronization functions including data locks,
barriers, critical regions, and condition variables. (All of these forms of
synchronization and others are available via KSR compilers, libraries, and OS
calls.)
 
A "lock" in ALLCACHE memory is achieved by setting a sub-page to the Atomic
state. A program does that by issuing a GET instruction on the address of a
byte within the desired sub-page. This instruction will cause the search engine
to find the sub-page and -- if the page is not in Atomic state -- return it to
the requesting processor in Atomic state. In the process the search engine will
ensure that all other copies of the sub-page are set Invalid.  If the sub-page
is already Atomic it will not be returned to the requestor immediately. Instead
the request packet will return to the requestor with an indicator that the
sub-page was found in the Atomic state.  A program removes Atomic state from a
sub-page by issuing the RELEASE instruction. 
In addition to the basic functional roles of the search engine (finding
sub-pages within the set of local caches and maintaining consistency), the
search engine must be scalable -- it must be implemented in such a way that
good performance continues to be delivered as the number of processors grows.
This objective is achieved in the KSR1 by implementing the search engine as a
hierarchy.
The KSR1 search engine is a two-level hierarchy of uni-directional rings.  
Each ring is a sequence of point- to-point connections among a set of units,
with the last unit in the set being connected back to the first. Each unit is a
combination of a router for request/response packets and a directory. The
router can move a packet farther along the ring or send it up or down in the
hierarchy. All of the units on all rings can operate simultaneously, so the
search- engine is a highly parallel mechanism.
The lowest level rings are called Search-Engine:0s (or SE:0).   Each SE:0 can
be configured to contain from eight to 32 processor/local cache pairs.  Each
processor/local cache pair is connected to exactly one SE:0 via a unit which
contains a directory for that local cache. There is one entry in the directory
for each page allocated in the local cache. The entry gives the SVA address of
the page and the state of each of its sub-pages.  When a packet passes such a
unit, it can determine whether the subpage the packet is seeking can be found
in the desired state in the local cache. If so, the unit routes the packet
there, if not it moves the packet on to the next unit on the ring. 
The unit on a SE:0 which connects upward to the next higher level is called a
ALLCACHE Routing and Directory cell or ARD. It contains a directory covering
the entire SE:0 -- there is an entry in its directory for every page allocated
on every local cache on the ring. When a packet reaches an ARD it will be moved
to the next unit on the SE:0 if the directory in the ARD indicates that the
data sought is on the SE:0. If not, the packet is routed up to the next higher
level in the hierarchy.
The ring at the top level of a KSR1 is called Search-Engine:l (or SE:1). SE:l
becomes involved in a search operation when a processor requests a sub-page
which is stored (for the moment) in a local cache on a different SE:0.   A SE:1
can be configured to connect two to 34 SE:0s.  Hence the maximum system size in
a KSR1 is (32*34) 1088 processors with 34 Gigabytes of ALLCACHE memory.
 
SE:l is composed of ARDs, each containing a directory for the SE:0 to which it
is connected. This directory is essentially a duplicate of the one stored in
the ARD on the corresponding SE:0. When a packet reaches an ARD on SE:l, it
will be moved to the next ARD on the ring if the directory in the ARD indicates
that the data sought is not on the corresponding SE:0. Otherwise, the packet is
routed down to the ARD on SE:0.
In the KSR1 the packet passing speed of an SE:0 is 8 million packets per
second. SE:ls can be configured to handle 8, 16, or 32 million packets per
second.  Each packet contains 128 bytes of data; hence the SE:0 bandwidth is 1
Gigabyte/sec and the SE:1 bandwidth ranges from 1 to 4 Gigabytes/sec.
The KSR1 Processor
The KSR1 processor is a four chip set implemented in 1.2 micron CMOS.  
One of these chips, called the Cell Execution Unit or CEU, is the basic control
unit of the processor.  On each clock cycle it fetches two instructions from
memory.  Certain instructions (loads, stores, branches, address arithmetic)
will be executed directly by the CEU; others will be executed by a co-processor
for execution.  The CEU is responsible for all instructions dealing with
memory.  These instructions operate on 40 bit addresses.  This design
characteristic of the processor architecture is fundamental to the system
design.  In order to build a shared memory multi-processor with large numbers
of processors, a large address is essential, 32 bits is not sufficient to
address the amount of memory required.  The KSR1 architecture actually
envisions a 64 bit address (pointers are stored as 64 bit quantities) but, due
to implementation constraints, the first generation address size is 40 bits --
and that is clearly sufficient for 1088 processor systems being built at this
time.  The CEU has 32 address registers, each 40 bits wide.
 
The CEU operates with three co-processors:
 
FPU (floating point unit) - This chip executes arithmetic operations on IEEE
floating point format values.  It has 64 registers each 64 bits wide.  It
supports linked triad instructions in which two floating point operations are
initiated from a single instruction, giving a peak floating point rate of 40
MFLOPS.   Sustained floating point performance depends on the application, of
course.  Examples include: 6.6 MFLOPS (Livermore Loops harmonic mean),  15
MFLOPS (100 X 100 Linpack), 28 MFLOPS (FFT), and 32 MFLOPS (Matrix Multiply).
 
IPU (integer and logical operations unit) - This chip performs arithmetic and
logical operations on 64 bit integers stored in 32 registers (each 64 bits
wide).  
 
XIU (I/O channel) - This chip provides a 30 MB/sec pathway to peripheral
devices.  Since there is an XIU on every cell, large systems can be configured
with very high aggregate bandwidth to disk drives and networks.
 
 
Summary
 
The KSR1 story is primarily software, ease of portability and programmability
made possible by our ALLCACHE memory architecture.  In order to deliver
sequentially consistent shared memory we developed our own CMOS microprocessor;
the architecture is embedded in the silicon, taking advantage of very low
latency networks and providing a system design that rarely incurs a latency
penalty.  The ALLCACHE memory system delivers programming simplicity and
performance without sacrificing the scalability benefit of distributed memory. 
  
 
email: ksr-info@ksr.com

70.2Opens 3/23 - 3M sharesDECEAT::SHAHWed Feb 26 1992 12:5912
    
    The issue opens on 3/23 in the $9-$11 range. The initial offering is ~3M
    shares (at least, maybe more) under the symbol KSRC.
    
    The contacta at the Kiter Peabody & Co. are -
    
    Bill Dore and Jeffery Hargaton (617) 261-1110.
    
    They are going to mail me the prospectus in couple of days. I'll post
    more information if anyone else is interested.
    
    /Alkesh
70.3Any more informationKEPNUT::MONTAGNAThu Mar 05 1992 13:005
    Any new information pro or con on Kendall Square Reaserch?
    
    I heard that there is a great amount interest in the stock this early
    and that the stock could be over subscribed before the initial
    offering.
70.4From tomorrow's VNSFRITOS::TALCOTTMon Mar 09 1992 18:5916
 Kendall Square Research - Plans to go public within a month or so
        {The Wall Street Journal, 9-Mar-92, p. C6}
   The $9 to $11 a share price for the 10 million shares would value the
 company at more than $100 million, or 110 times last year's revenue. KSR has
 sold three supercomputers. KSR has spent $56.9 million on research and
 development since its 1986 founding and just sold its first supercomputer in
 September, reaping revenue of $903,668 for 1991. Losses for the year were
 $22.5 million, 69% wider than a loss of $13.3 million in 1990. The company has
 since shipped three more supercomputers, the largest of which lists for $2.2
 million, but only two have been accepted - and Kendall can recognize revenue
 for a shipment only after its been accepted. The company's machines will cost
 from $50,000 to $30 million, but it's biggest and most impressive - a $30
 million system - won;t even be formally introduced until later this year.
 Kendall, which is entering a highly competitive field, isn't expected to
 report profits any time soon. The reception for KSR is likely to be a
 bellwether for other technology companies hoping to go public.
70.5KSRC to trade this AMCSSE::POTTERFri Mar 27 1992 12:529
     Kendall Square Research (NASDAQ:KSRC) is supposed to start trading
     this morning.  The IPO price is expected to be on the high end of the
     $9-$11 range listed in the prospectus.
    
     Anybody had any luck getting any shares in the IPO, looks like I got
     shut out...the underwriter tells me demand is high.
    
     fwiw,
     John
70.6It was $11. Thats where it opened and closedVINO::FLEMMINGHave XDELTA, will travelSat Mar 28 1992 07:311