[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

1180.0. "closed form integral for normal distribution function?" by REGENT::POWERS () Wed Jan 10 1990 13:15

Is there a closed form expression of the integral of the normal
probability distribution function f(x) = (1/sqrt(2*pi))*exp(-0.5*x**2)?
I'd like to compute the fraction of a population between given values of x.
My CRC book lists the derivatives of this function, and indicates (obviously)
that the definite integral for a given range is what I want,  but I can't
find a closed form expression of the integral itself.
My calculus skills are very rusty, and I'm having trouble recalling 
the right substitutions to integrate a form of exp(f(t))dt.

- tom powers]
T.RTitleUserPersonal
Name
DateLines
1180.1Sorry, no such thing ...COOKIE::PBERGHPeter Bergh, DTN 523-3007Wed Jan 10 1990 13:322
    To the best of my knowledge, there is no closed form for the integral
    of the normal probability distribution function.
1180.2No can do, can come closeVMSDEV::HALLYBThe Smart Money was on GoliathWed Jan 10 1990 14:235
    That's a theorem that there is no closed form for the integral.
    However I recall there's a 5th degree polynomial (or so) that is
    quite an accurate approximation, if that will suffice.
      
      John
1180.3ALLVAX::ROTHIt's a bush recording...Wed Jan 10 1990 16:044
    See note 1136 and some of the replies - there are some routines
    that can be adapted nicely to your problem...

    - Jim
1180.4REGENT::POWERSTue Jan 16 1990 12:1516
>    However I recall there's a 5th degree polynomial (or so) that is
>    quite an accurate approximation, if that will suffice.


That would be handy....

...as would be some partly tongue-in-cheek background:

  1)  If we don't have a closed form for the integral, how do we
      know the total area under the curve is, in fact, 1.00000......?
  2)  Presuming that the answer to 1) is based on connections with
      binomial distribution and sum of the negative powers of 2,
      what is the derivation of the form of the curve as an exponential
      of a function of x**2?

- tom powers]
1180.5A partial answer ...COOKIE::PBERGHPeter Bergh, DTN 523-3007Tue Jan 16 1990 14:4754
>>  1)  If we don't have a closed form for the integral, how do we
>>      know the total area under the curve is, in fact, 1.00000......?
    
    The easiest way that I know of to evaluate I(-infinity, +infinity,
    e**(-x*x), dx) goes roughly as follows (for infinity, I use the symbol
    oo):
    
    Consider I(-oo, +oo, e**(-x*x), dx) * I(-oo, +oo, e**(-y*y), dy) = Z.
    
    Notice that this product is the same as the double integral over the
    whole (x,y) plane: II(-oo, +oo, -oo, +oo, e**(-x*x)*e**(-y*y), dx*dy)
    which in turn equals II(-oo, +oo, -oo, +oo, e**(-x*x-y*y), dx*dy).
    
    Transforming to polar coordinates, we get that
    
    Z = II(0, 2*PI, 0, +oo, r*e**(-r*r), dtheta*dr)
    
    Here, we can separate the two variables of integration, so
    
    Z = I(0, 2*PI, 1, dtheta) * I(0, +oo, r*e**(-r*r), dr).
    
    These two integrals can easily be evaluated and we get that Z = PI.
    
    Thus, we have proved that I(-oo, +oo, e**(-x*x), dx) = sqrt(PI).
    
    (Note that I haven't bothered to quote chapter and verse of the
    appropriate theorems; the integrands are extremely well behaved, so
    ordinary Riemann-integration theorems ought to suffice to justify these
    calculations.)
    
>>  2)  Presuming that the answer to 1) is based on connections with
>>      binomial distribution and sum of the negative powers of 2,
>>      what is the derivation of the form of the curve as an exponential
>>      of a function of x**2?

    As you notice, the binomial distribution does not enter into the proof
    at all, neither do negative powers of two.  I don't know what your
    question here is aiming at, but I can tell you of a theorem in
    statistics (the law of large numbers) which I think may answer at least
    part of your question.  The theorem goes roughly as follows:
    
    Given a set of independent random variables with the same distribution
    (note that there is no requirement for them to have a binomial
    distribution; the law of large numbers doesn't "care" what the
    distribution of a single random variable is), the sum of N of these
    random variables will have a distribution that converges in probability
    to a normal distribution.  This has often been used to get a quick
    approximation to a normally-distributed random variable (one simply
    adds enough uniformly-distributed random variables and, presto, the sum
    is approximately normally distributed).
    
    (Convergence in probability means roughly "the probability of the
    distribution differing from the normal distribution converges to zero as
    the number of terms in the sum increases".)
1180.6AITG::DERAMODaniel V. {AITG,ZFC}:: D'EramoWed Jan 17 1990 01:198
        re .5,
        
        I believe that the theorem at the end of reply .5 needs
        the added condition that the random variables'
        distribution have a well defined and finite mean and
        variance.
        
        Dan
1180.7ALLVAX::ROTHIt's a bush recording...Wed Jan 17 1990 05:4021
    Re .-1

    Yes, that's clearly correct on intuitive grounds; there's no way
    a PDF that's a set of impulses will converge to a proper Gaussian.

    An easy way to see that sums of "nicly distributed" random variables
    converge to a normal distribution is that the distribution of their
    sum is the convolution of the individual distributions.  Convolution
    causes smoothing and spreading out; try the simplest case of convolving
    rectangular pulses - very quickly a bell-shaped curve results.  In
    fact, n-fold convolution of a rectangular pulse gives the uniform
    B-splines.

		   +--+			      +
		   |  |			     / \
		   |  |		->	    /   \	->  etc.
		---+  +---		---+     +---

	    one constant piece	       2 linear pieces	   3 parabolic pieces...

    - Jim
1180.8REGENT::POWERSWed Jan 17 1990 12:5111
My reference to the binomial theorem was in regard to the physical 
demonstration of the normal distribution by dumping the balls over 
the pyramid of pegs and seeing how the normal curve appears as a 
histogram underneath.
The reference to the negative powers of two comes from this demonstration
(1/2 probability of left or right for each ball at every peg)
and the fact that the sun of 2**(-i) for i=1 to infinity is 1.

Admittedly naive....

- tom]
1180.9A confirmation and a refutationCOOKIE::PBERGHPeter Bergh, DTN 523-3007Wed Jan 17 1990 13:2019
    Re .6: the requirement for a finite variance and a finite expected
    	value is correct
    
                      <<< Note 1180.8 by REGENT::POWERS >>>

>> My reference to the binomial theorem was in regard to the physical 
>> demonstration of the normal distribution by dumping the balls over 
>> the pyramid of pegs and seeing how the normal curve appears as a 
>> histogram underneath.
    
    According to a book that I read some twenty years ago ("Theory of
    probability" by Gnedenko), the fact that the binomial distribution
    converges to the normal distribution as the number of trials grows is
    du to DeMoivre and Laplace, so that demonstration is probably a very
    early example of the occurrence (admittedly, only in the limit) of the
    normal distribution in nature.
    
    Thus, this is not naive; it is an excellent example of the use of the
    theorem in .5.
1180.10counter-example and two incomplete derivationsPULSAR::WALLYWally Neilsen-SteinhardtThu Jan 18 1990 16:0657
    re:          <<< Note 1180.7 by ALLVAX::ROTH "It's a bush recording..." >>>

>    Yes, that's clearly correct on intuitive grounds; there's no way
>    a PDF that's a set of impulses will converge to a proper Gaussian.
    
    Consider a PDF which is zero everywhere but x=-1 and x=1, and its
    values there are such that the integral over the whole line is 1. 
    Obviously we need integrals something like Stieljes (and I cannot even
    remember how to spell it!)  This has zero mean and finite variance, 
    but the sum of these distributions look like binomial distributions, 
    and converge to the normal distribution in the very qualified sense
    mentioned earlier.
    
    To fail to converge to a normal distribution, the starting distribution
    has to lack a finite mean or variance, as previously stated.

    I have seen two other derivations for the form of the bell shaped
    curve.  I could not reproduce either when I tired, but maybe if I put
    down what I remember, somebody else will fill in the gaps.
    
    A: Start with any well-known PDF, like the binomial distribution.  Let
    the parameters in the distribution become very large.  Take logs of 
    both sides, and apply Stirling's Approximation
    
    	log n! = n log n  approximately
    
    to all the factorials.  After a bit of algebra (which is what I have
    forgotten) you end up with something like
    
    	log P = something - (r - n/2)**2 / something
    
    Raise e to the power of both sides and you get the Gaussian.  The first
    term just becomes the normalization constant.  Obviously, this only 
    proves that a particular PDF converges to the Gaussian, but it is still 
    interesting.
    
    
    B: Start with the fact that if you take n samples from any PDF, the
    mean of the sampling distribution is the mean of the PDF, and the
    variance of the sampling distribution is the variance of the PDF
    divided by n.  Consider the logarithm of the sampling distribution, and
    expand it around its mean:
    
    	log P(x-xm) = A0 + A1*(x-xm) + A2*(x-xm)^2 + ...
    
    One part I forgot is how you show that xm is also a maximum and
    therefore A1=0.  Another part is how A2 remains constant while you
    increase n so that the variance decreases to zero, so for all the x
    where P is significantly far from zero, higher terms may be ignored.
    The result is the limit
    
    	log P(x-xm) = A0 + A2 * (x-xm)^2
    
    and you raise e to both sides as above.
    
    Neither of these is a proof of the central limit theorem, but they may
    give you a better feeling for where the Gaussian came from.
1180.11EVMS::HALLYBFish have no concept of fireWed Jul 17 1996 16:0413
    Here's a problem I've come across that seems intuitively obvious
    but no proof comes to mind, other than "visualize it and it's obvious".
    
    Suppose N(I) is the area of the interval I under the standard 
    normal curve.
    
    Let I  be an interval of length dx containing 0 as an interior point.
    Let I' be an interval of length dx not containing 0, interior or end.
    
    Claim N(I) > N(I') is obviously true. Is there any rigorous way to
    prove this?
    
      John
1180.12AUSS::GARSONDECcharity Program OfficeWed Jul 17 1996 22:5646