Sebaran normal: Béda antarrépisi

Konten dihapus Konten ditambahkan
Rotlink (obrolan | kontribusi)
m fixing dead links
Ilhambot (obrolan | kontribusi)
m Ngarapihkeun éjahan, replaced: mangrupakeun → mangrupa (2), oge → ogé (3), nyaeta → nyaéta (4), make → maké (3), rea → réa (8), ngarupakeun → mangrupa (2), yen → yén , ea → éa (29), eo → éo (5), dimana → di mana
Baris ka-2:
[[Gambar:Gaussian-pdf.png|thumb|300px|[[Probability density function]] of Gaussian distribution (bell curve).]]
 
'''Normal distribution''' (distribusi normal) mangrupakeunmangrupa hal anu penting dina [[probability distribution]] di loba widang.
Biasa ogeogé disebut '''Gaussian distribution''', hususna dina widang [[fisika]] jeung [[rékayasa]].
Dina kaayaan sabenerna kumpulan distribusi mibanda bentuk anu sarupa, bedana ngan dina parameter ''location'' jeung ''scale'': [[nilai ekspektasi|mean]] jeung [[simpangan baku]]. '''Standard normal distribution''' nyaetanyaéta distribusi normal anu mibanda nilai ''mean'' sarua jeung nol sarta nilai standar deviasi sarua jeung hiji. Sabab bentuk grafik [[fungsi dénsitas probabilitas|dénsitas probabilitas]] mangrupa [[bell]], sering disebut '''bell curve'''.
 
== Sajarah ==
 
Distribusi normal mimiti dikenalkeun ku [[Abraham de Moivre|de Moivre]] dina artikel taun [[1733]] (dicitak ulang edisi kaduana dina ''[[The Doctrine of Chances]]'', [[1738]]) dina kontek "pendekatan" [[sebaran binomial]] keur ''n'' anu loba. Hasil de Moivre diteruskeun ku [[Pierre Simon de Laplace|Laplace]] dina bukuna ''[[Analytical Theory of Probabilities]]'' ([[1812]]), mangsa kiwari disebut [[Theorem of de Moivre-Laplace]].
 
Laplace ngagunakeun distribusi normal keur [[analysis of errors]] dina percobaanna. [[Method of least squares]] nu kacida pentingna dikenalkeun ku [[Adrien Marie Legendre|Legendre]] dina taun [[1805]]. [[Carl Friedrich Gauss|Gauss]], ogeogé ngakukeun yenyén manehna geus makemaké metoda anu sarua ti mimiti taun [[1794]], justified it rigorously in [[1809]] by assuming a normal distribution of the errors.
 
Istilah "bell curve" ngacu ka [[Jouffret]] nu ngagunakeun watesan "bell surface" dina taun [[1872]] keur [[multivariate normal distribution|bivariate normal]] dina komponen bebas (independent). Istilah "sebaran normal" "ditemukan" sacara sewang-sewangan ku [[Charles S. Peirce]], [[Francis Galton]] jeung [[Wilhelm Lexis]] kira-kira taun [[1875]] [Stigler]. This terminology is unfortunate, since it reflects and encourages the fallacy that "everything is Gaussian". (See the discussion of "occurrence" below).
Baris ka-20:
== Spesifikasi sebaran normal ==
 
Aya sababaraha jalan keur nangtukeun random variable. Anu paling ngagambarkeun nyaetanyaéta probability density function (plot at the top), which represents how likely eachéach value of the random variable is. The cumulative density function is a conceptually cleanercléaner way to specify the same information, but to the untrained eye its plot is much less informative (see below). Equivalent ways to specify the normal distribution are: the moments, the [[cumulant]]s, the [[characteristic function]], the [[moment-generating function]], and the cumulant-[[generating function]]. Some of these are very useful for theoreticalthéoretical work, but not intuitive. See [[probability distribution]] for a discussion.
 
All of the [[cumulant]]s of the normal distribution are zero, except the first two.
Baris ka-26:
=== Fungsi probabiliti densiti ===
 
[[Fungsi dénsitas probabilitas]] dina '''sebaran normal''' numana meanméan μ jeung simpangan baku σ (sarua jeung, [[varian]] σ<sup>2</sup>) mangrupakeunmangrupa conto '''[[Gaussian function|fungsi Gauss]]''',
:<math>f(x) = {1 \over \sigma\sqrt{2\pi} }\,e^{-{(x-\mu )^2 / 2\sigma^2}}</math>
(Tempo ogeogé [[exponential function|fungsi eksponensial]] jeung [[pi]].) Lamun [[variabel acak]] ''X'' ngabogaan distribusi ieu, bisa dituliskeun ''X'' ~ N(μ, σ<sup>2</sup>). Lamun μ = 0 jeung σ = 1, distribusi disebut distribusi standar normal, rumusna
 
:<math>f(x) = {1 \over \sqrt{2\pi} }\,e^{-{x^2 / 2}}</math>
Baris ka-35:
 
For all normal distributions,
the density function is symmetric about its meanméan value. About 68% of the areaaréa under the curve is within one standard deviation of the meanméan, 95.5% within two standard deviations, and 99.7% within three standard deviations. The [[inflection point]]s of the curve occur at one standard deviation away from the meanméan.
 
=== Fungsi Sebaran Kumulatif ===
 
[[Fungsi sebaran kumulatif]] (saterusna disebut ''cdf'') hartina probabilitas dimanadi mana nilai variabel ''X'' leuwih leutik tinimbang ''x'', jeung digambarkeun dina watesan fungsi densiti nyaeta nyaéta
 
:<math>\Pr(X \le x) = \int_{-\infty}^x \frac{1}{\sigma\sqrt{2\pi}} e^{-(u-\mu)^2/(2\sigma^2)}\,du</math>
 
Standar normal cdf, sacara konvensional dilambangkeun ku <math>\Phi</math>, ngarupakeunmangrupa nilai cdf umum di-evaluasi ku <math>\mu=0</math> jeung <math>\sigma=1</math>,
 
:<math>\Phi(z) = \int_{-\infty}^z {1 \over \sqrt{2\pi} }\,e^{-{x^2 / 2}}\,dx</math>
Baris ka-51:
:<math>\Phi(z) = \frac{1}{2} \left(1 + \operatorname{erf}\,\frac{z}{\sqrt{2}}\right)</math>
 
The following graph shows the cumulative distribution function for values of ''z'' from -4 to +4:
 
[[Gambar:Cumulative_normal_distribution.png]]
Baris ka-65:
[[characteristic function|Fungsi karakteristik]] dihartikeun salaku [[nilai ekspektasi]]
<math>e^{itX}</math>.
Keur sebaran normal, ieu bisa ditembongkeun dina fungsi karakteristik nyaeta nyaéta
 
:<math>\phi_X(t)=E\left[e^{itX}\right]=\int_{-\infty}^{\infty} \frac{1} {\sigma\sqrt{2\pi}}\,e^{-{(x-\mu )^2 / 2\sigma^2}}\,e^{itx}\,dx = e^{i\mu t-\sigma^2 t^2/2}</math>
Baris ka-73:
== Pasipatan ==
 
# Lamun ''X'' ~ N(μ, σ<sup>2</sup>) sarta ''a'' sarta ''b'' ngarupakeunmangrupa [[real number|wilangan riil]], mangka ''aX + b'' ~ N(''a''μ + b, (''a''σ)<sup>2</sup>).
# If ''X''<sub>1</sub> ~ N(μ<sub>1</sub>, σ<sub>1</sub><sup>2</sup>) and ''X''<sub>2</sub> ~ N(μ<sub>2</sub>, σ<sub>2</sub><sup>2</sup>), and ''X''<sub>1</sub> and ''X''<sub>2</sub> are ''independent'', then ''X''<sub>1</sub> + ''X''<sub>2</sub> ~ N(μ<sub>1</sub> + μ<sub>2</sub>, σ<sub>1</sub><sup>2</sup> + σ<sub>2</sub><sup>2</sup>).
# If ''X''<sub>1</sub>, ..., ''X''<sub>''n''</sub> are [[Statistical independence|independent]] standard normal variables, then ''X''<sub>1</sub><sup>2</sup> + ... + ''X''<sub>''n''</sub><sup>2</sup> has a [[sebaran chi-kuadrat]] with ''n'' degrees of freedom.
Baris ka-81:
As a consequence of Property 1, it is possible to relate all normal random variables to the standard normal.
 
If ''X'' is a normal random variable with meanméan μ and variance σ<sup>2</sup>, then
 
:<math> Z = \frac{X - \mu}{\sigma} </math>
Baris ka-90:
:<math>\Pr(X<x) = \Phi\left(\frac{x-\mu}{\sigma}\right) = \frac{1}{2} \left(1+\mbox{erf}\,\left(\frac{x-\mu}{\sigma\sqrt{2}}\right)\right)</math>
 
Conversely, if ''Z'' is a standard normal random variable, then
 
:<math>X=\sigma Z+\mu \,</math>
 
is a normal random variable with meanméan μ and variance σ<sup>2</sup>.
 
The standard normal distribution has been tabulated, and the other normal distributions are simple transformations of the standard one.
Baris ka-107:
This requires generating values from a uniform distribution, for which many methods are known. See also [[random number generator]]s.
 
The Box-Muller transform is a consequence of Property 3 and the fact that the chi-square distribution with two degrees of freedom is an exponential random variable (which is easyéasy to generate).
 
=== The central limit theorem ===
Baris ka-115:
This is the so-called [[central limit theorem]].
 
The practical importance of the central limit theoremthéorem is that the normal distribution can be used as an approximation to some other distributions.
 
* [[Sebaran binomial]] mibanda parameter ''n'' sarta ''p'' ngadeukeutan kana normal keur ''n'' nu badag sarta ''p'' teu deukeut ka 1 atawa 0. ''Pendekatan'' sebaran normal mibanda meanméan μ = ''np'' sarta simpangan baku σ = (''n p'' (1 - ''p''))<sup>1/2</sup>.
* A [[Poisson distribution]] with parameter λ is approximately normal for large λ. The approximating normal distribution has meanméan μ = λ and standard deviation σ = √λ.
 
* A [[Poisson distribution]] with parameter λ is approximately normal for large λ. The approximating normal distribution has mean μ = λ and standard deviation σ = √λ.
 
Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution.
Baris 128 ⟶ 127:
 
''Approximately'' normal distributions occur in many situations, as a result of the [[central limit theorem]].
When there is reasonréason to suspect the presence of a large number of small effects ''acting additively'', it is reasonableréasonable to assume that observations will be normal.
There are statistical methods to empirically test that assumption.
 
Effects can also act as '''multiplicative''' (rather than additive) modifications. In that case, the assumption of normality is not justified, and it is the [[logarithm]] of the variable of interest that is normally distributed. The distribution of the directly observed variable is then called [[log-normal distribution|log-normal]].
 
Finally, if there is a single external influence which has a large effect on the variable under consideration, the assumption of normality is not justified either. This is true even if, when the external variable is held constant, the resulting distributions are indeed normal. The full distribution will be a superposition of normal variables, which is not in general normal. This is related to the theorythéory of errors (see below).
 
To summarize, here's a list of situations where approximate normality
is sometimes assumed. For a fuller discussion, see below.
* In counting problems (so the central limit theoremthéorem includes a discrete-to-continuum approximation) where [[reproductive family|reproductive random variables]] are involved, such as
** Binomial random variables, associated to yes/no questions;
** Poisson random variables, associates to [[rare events]];
* In physiological measurementsméasurements of biological specimens:
** The ''logarithm'' of measuresméasures of size of living tissue (length, height, skin areaaréa, weight);
** The ''length'' of ''inert'' appendages (hair, claws, nails, teeth) of biological specimens, ''in the direction of growth''; presumably the thickness of tree bark also falls under this category;
** Other physiological measuresméasures may be normally distributed, but there is no reasonréason to expect that ''a priori'';
* MeasurementMéasurement errors are ''assumed'' to be normally distributed, and any deviation from normality must be explained;
* Financial variables
** The ''logarithm'' of interest rates, exchange rates, and inflation; these variables behave like compound interest, not like simple interest, and so are multiplicative;
** Stock-market indices are supposed to be multiplicative too, but some researchersreséarchers claim that they are [[log-Lévy]] variables insteadinstéad of [[log-normal distribution|lognormal]];
** Other financial variables may be normally distributed, but there is no reasonréason to expect that ''a priori'';
* Light intensity
** The intensity of laser light is normally distributed;
** Thermal light has a [[Bose-Einstein statistics|Bose-Einstein]] distribution on very short time scales, and a normal distribution on longer timescales due to the central limit theoremthéorem.
 
Of relevance to biology and economics is the fact that complex systems tend to display [[power law]]s rather than normality.
Baris 157 ⟶ 156:
=== Photon counts ===
 
Light intensity from a single source varies with time, and is usually assumed to be normally distributed. However, quantum mechanics interprets measurementsméasurements of light intensity as [[photon]] counting. Ordinary light sources which produce light by thermal emission, should follow a [[Poisson distribution]] or [[Bose-Einstein distribution]] on very short time scales. On longer time scales (longer than the [[coherence time]]), the addition of independent variables yields an approximately normal distribution. The intensity of laser light, which is a quantum phenomenon, has an exactly normal distribution.
 
=== Measurement errors ===
 
RepeatedRepéated measurementsméasurements of the same quantity are expected to yield results which are clustered around a particular value. If all major sources of errors have been taken into account, it is ''assumed'' that the remaining error must be the result of a large number of very small ''additive'' effects, and hence normal. Deviations from normality are interpreted as indications of systematic errors which have not been taken into account. Note that this is the ''central '''assumption''''' of the mathematical [[theory of errors]].
 
=== Physical characteristics of biological specimens ===
 
The overwhelming biological evidence is that bulk growth processes of living tissue proceed by multiplicative, not additive, increments, and that therefore measuresméasures of body size should at most follow a lognormal rather than normal distribution. Despite common claims of normality, the sizes of plants and animals is approximately lognormal. The evidence and an explanation based on models of growth was first published in the classic book
 
:Huxley, Julian: Problems of Relative Growth (1932)
 
Differences in size due to sexual dimorphism, or other polymorphisms like the worker/soldier/queen division in social insects, further makemaké the joint distribution of sizes deviate from lognormality.
 
The assumption that linearlinéar size of biological specimens is normal leadsléads to a non-normal distribution of weight (since weight/volume is roughly the 3rd power of length, and gaussian distributions are only preserved by linearlinéar transformations), and conversely assuming that weight is normal leadsléads to non-normal lengths. This is a problem, because there is no ''a priori'' reasonréason why one of length, or body mass, and not the other, should be normally distributed. Lognormal distributions, on the other hand, are preserved by powers so the "problem" goes away if lognormality is assumed.
 
* blood pressure of adult humans is supposed to be normally distributed, but only after separating males and females into different populations (eachéach of which is normally distributed)
* The length of inert appendages such as hair, nails, teet, claws and shells is expected to be normally distributed if measuredméasured in the direction of growth. This is because the growth of inert appendages depends on the size of the root, and not on the length of the appendage, and so proceeds by ''additive'' increments. Hence, we have an example of a sum of very many small lognormal increments approaching a normal distribution. Another plausible example is the width of tree trunks, where a new thin ring if produced every yearyéar whose width is affected by a large number of factors.
 
=== Financial variables ===
 
Because of the exponential nature of [[interest]] and [[inflation]], financial indicators such as [[interest rate]]s, [[share|stock]] values, or [[commodity]] [[price]]s makemaké good examples of ''multiplicative'' behaviour. As such, they should not be expected to be normal, but lognormal.
 
[[Benoît Mandelbrot]], the popularizer of [[fractals]], has claimed that even the assumption of lognormality is flawed.
Baris 184 ⟶ 183:
=== Lifetime ===
 
Other examples of variables that are ''not'' normally distributed include the lifetimes of humans or mechanical devices. Examples of distributions used in this connection are the [[sebaran eksponensial]] (memoryless) and the [[Weibull distribution]]. In general, there is no reasonréason that [[waiting times]] should be normal, since they are not directly related to any kind of additive influence.
 
=== Test scores ===
Baris 190 ⟶ 189:
The IQ score of an individual for example can be seen as the result of many small additive influences: many genes and many environmental factors all play a role.
 
* [[IQ|IQ scores]] and other ability scores are approximately normally distributed. For most IQ tests, the meanméan is 100 and the standard deviation is 15.
 
''Criticisms: test scores are discrete variable associated with the number of correct/incorrect answers, and as such they are related to the binomial. Moreover (see [http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=b26c3b%241s3c%40odds.stat.purdue.edu this USENET post]), raw IQ test scores are customarily 'massaged' to force the distribution of IQ scores to be normal. Finally, there is no widely accepted model of intelligence, and the link to IQ scores let alone a relationship between influences on intelligence and '''additive''' variations of IQ, is subject to debate.''
Baris 202 ⟶ 201:
* [http://ce597n.www.ecn.purdue.edu/CE597N/1997F/students/michael.a.kropinski.1/project/tutorialMichael A. Kropinski's normal distribution tutorial]
* S. M.Stigler: ''Statistics on the Table'', Harvard University Press 1999, chapter 22. History of the term "normal distribution".
* [http://web.archive.org/19990117033417/members.aol.com/jeff570/mathword.html Earliest Known uses of some of the Words of Mathematics]. See: [http://web.archive.org/19991003084940/members.aol.com/jeff570/n.html] for "normal", [http://web.archive.org/19990508225359/members.aol.com/jeff570/g.html] for "Gaussian", and [http://web.archive.org/19990508224238/members.aol.com/jeff570/e.html] for "error".
* [http://web.archive.org/20000610213020/members.aol.com/jeff570/stat.html Earliest Uses of Symbols in Probability and Statistics]. See Symbols associated with the Normal Distribution.