Klasifikasi naif Bayésian

Artikel ieu keur dikeureuyeuh, ditarjamahkeun tina basa Inggris.
Bantuanna didagoan pikeun narjamahkeun.

Klasifikasi Naive Bayesian mangrupa métodeu klasifikasi probabiliti sederhana. Watesan nu leuwih jentre dina kaayaan modél probibiliti nyaéta independent feature model. Watesan naive Bayes dumasar kana kanyataan yén modél probabiliti bisa diturunkeun ngagunakeun Bayes' Theorem (keur ngahargaan Thomas Bayes) sarta pakait kacida jeung asumsi bébas nu teu kapanggih di alam nyata, sabab kitu mangrupa (sacara ngahaja) naive. Gumantung kana katepatan pasti tina modél probiliti, klasifikasi naive Bayes bisa direntetkeun kacida efisien dina susunan supervised learning. Dina pamakéan praktis, paraméter estimasi keur modél naive Bayes maké métodeu maximum likelihood; dina basa séjén, hiji hal bisa digawekeun mibanda modél naive Bayes bari teu nuturkeun Bayesian probability atawa ngagunakeun unggal métodeu Bayesian.

Model probabiliti naive Bayes

Sacara abstrak, modél probabiliti klasifikasi mangrupa modél kondisional

p(C\vert F_{1},\dots ,F_{n})\,

dina kelas variabel terikat $C$ mibanda sajumlah leutik hasil atawa kelas, kondisional dina sababaraha sipat variabel $F_{1}$ nepi ka $F_{n}$ . Masalahna lamun jumlah sipat $n$ badag atawa waktu sipat bisa dicokot tina nilai wilangan nu badag, mangka dumasar kana modél dina tabel probabiliti mangrupa hal infeasible. Mangak kudu dirumuskeun duei modélna keur nyieun nu leuwih hadé.

Ngangunakeun Bayes' theorem, dituliskeun

p(C\vert F_{1},\dots ,F_{n})={\frac {p(C)\ p(F_{1},\dots ,F_{n}\vert C)}{p(F_{1},\dots ,F_{n})}}

Dina praktékna urang ngan museurkeun kana pembilang, pembagi heunteu gumantung kana $C$ sarta nilai sipat $F_{i}$ dibérékeun, mangka pembagi mangrupa konstanta. Pembilang sarua jeung modél joint probability

p(C,F_{1},\dots ,F_{n})\,

nu bisa dituliskeun saperti di handap, ngagunakeun pamakéan pengulangan tina harti conditional probability:

p(C,F_{1},\dots ,F_{n})\,

=p(C)\ p(F_{1},\dots ,F_{n}\vert C)

=p(C)\ p(F_{1}\vert C)\ p(F_{2},\dots ,F_{n}\vert C,F_{1})

=p(C)\ p(F_{1}\vert C)\ p(F_{2}\vert C,F_{1})\ p(F_{3},\dots ,F_{n}\vert C,F_{1},F_{2})

=p(C)\ p(F_{1}\vert C)\ p(F_{2}\vert C,F_{1})\ p(F_{3}\vert C,F_{1},F_{2})\ p(F_{4},\dots ,F_{n}\vert C,F_{1},F_{2},F_{3})

jeung saterusna. Kiwari asumsi "naive" kondisional bébas loba dipaké: anggap unggal sipat $F_{i}$ mangrupa independent dina unggal sipat $F_{j}$ keur $j\neq i$ . Ieu hartina yen

p(F_{i}\vert C,F_{j})=p(F_{i}\vert C)\,

sarta modél gabungan ditembongkeun ku

p(C,F_{1},\dots ,F_{n})=p(C)\ p(F_{1}\vert C)\ p(F_{2}\vert C)\ p(F_{3}\vert C)\ \dots

=p(C)\prod _{i=1}^{n}p(F_{i}\vert C)

Ieu hartina yén dina kaayaan asumsi bébas di luhur, sebaran kondisional dina kelas variabel $C$ bisa ditembongkeun saperti kieu:

p(C\vert F_{1},\dots ,F_{n})={\frac {1}{Z}}p(C)\prod _{i=1}^{n}p(F_{i}\vert C)

nu mana $Z$ mangrupa faktor skala terikat ngan dina $F_{1},\dots ,F_{n}$ , contona, konstanta lamun nilai sipat variabel dipikanyaho.

modél dina bentuk ieu leuwih gamapang diurus, ti saprak ieu faktor disebut kelas prior $p(C)$ sarta sebaran probabiliti bébas $p(F_{i}\vert C)$ . Lamun di dinya kelas $k$ classes sarta lamun modél keur $p(F_{i})$ bisa digambarkeun dina watesan paraméter $r$ , mangka pakait jeung modél naive Bayes mibanda paraméter (k - 1) + n r k. Dina prakték, salawasna $k=2$ (klasifikasi biner) sarta $r=1$ (Bernoulli variable salaku sipat) mangrupa hal umum, sarta jumlah wilangan paraméter tina modél naive Bayes nyaéta $2n+1$ , nu mana $n$ mangrupa wilangan sipat biner nu dipaké keur prediksi.

Parameter estimasi

Dina watesan supervised learning, kahayang nga-estimasi paraméter tina modél sebaran. Sabab asumsi sipat bébas, éta cukup keur estimasi kelas prior jeung modél sipat kondisional bébas, ku maké métodeu maximum likelihood, Bayesian inference atawa prosedur paraméter estimasi séjénna.

Ngawangun klasifikasi tina model probabiliti

Diskusi leuwih jentre diturunkeun tina sipat modél bébas, nyaéta, modél probabiliti naive Bayes. Klasifikasi naive Bayes ngombinasikeun ieu modél nu mibanda decision rule. Salah sahiji aturan nu umum keur nangtukeun hipotesa nu leuwih mungkin; dipikanyaho salaku aturan kaputusan maksimum posterior atawa MAP. Klasifikasi pakait mangrupa fungsi ${\mathit {classify}}$ nu dihartikeun saperti:

{\mathit {classify}}(f_{1},\dots ,f_{n})=\mathop {\mathrm {argmax} } _{c}\ p(C=c)\prod _{i=1}^{n}p(F_{i}=f_{i}\vert C=c)

Diskusi

Klasifikasi naive Bayes mibanda sababaraha sipat nu ilahar dipaké dina prakték, despite the fact that the far-réaching independence assumptions are often violated. Like all probabilistic classifiers under the MAP decision rule, it arrives at the correct classification as long as the correct class is more probable than any other class; class probabilities do not have to be estimated very well. In other words, the overall classifier is robust to serious deficiencies of its underlying naive probability modél. Other réasons for the observed success of the naive Bayes classifier are discussed in the literature cited below.

In réal life, the naive Bayes approach is more powerful than might be expected from the extreme simplicity of its modél; in particular, it is fairly robust in the presence of non-independent attributes w_i. Recent théoretical analysis has shown why the naive Bayes classifier is so robust.

Conto: klasifikasi dokumen

Conto di dieu pagawéan nu maké klasifikasi naive Bayesian classification keur masalah document classification. Consider the problem of classifying documents by their content, for example into spam and non-spam E-mails. Imagine that documents are drawn from a number of classes of documents which can be modélled as sets of words where the (independent) probability that the i-th word of a given document occurs in a document from class C can be written as

p(w_{i}\vert C)\,

(For this tréatment, we simplify things further by assuming that the probability of a word in a document is independent of the length of a document, or that all documents are of the same length).

Then the probability of a given document D, given a class C, is

p(D\vert C)=\prod _{i}p(w_{i}\vert C)\,

The question that we desire to answer is: "what is the probability that a given document D belongs to a given class C?"

Now, by their definition, (see Probability axiom)

p(D\vert C)={p(D\cap C) \over p(C)}

and

p(C\vert D)={p(D\cap C) \over p(D)}

Bayes' théorem manipulates these into a statement of probability in terms of likelihood.

p(C\vert D)={p(C) \over p(D)}\,p(D\vert C)

Assume for the moment that there are only two classes, S and ¬S.

p(D\vert S)=\prod _{i}p(w_{i}\vert S)\,

and

p(D\vert \neg S)=\prod _{i}p(w_{i}\vert \neg S)\,

Using the Bayesian result above, we can write:

p(S\vert D)={p(S) \over p(D)}\,\prod _{i}p(w_{i}\vert S)

p(\neg S\vert D)={p(\neg S) \over p(D)}\,\prod _{i}p(w_{i}\vert \neg S)

Dividing one by the other gives:

{p(S\vert D) \over p(\neg S\vert D)}={p(S)\,\prod _{i}p(w_{i}\vert S) \over p(\neg S)\,\prod _{i}p(w_{i}\vert \neg S)}

Which can be re-factored as:

{p(S) \over p(\neg S)}\,\prod _{i}{p(w_{i}\vert S) \over p(w_{i}\vert \neg S)}

Thus, the probability ratio p(S | D) / p(¬S | D) can be expressed in terms of a series of likelihood ratios. The actual probability p(S | D) can be éasily computed from log (p(S | D) / p(¬S | D)) based on the observation that p(S | D) + p(¬S | D) = 1.

Taking the logarithm of all these ratios, we have:

\ln {p(S\vert D) \over p(\neg S\vert D)}=\ln {p(S) \over p(\neg S)}+\sum _{i}\ln {p(w_{i}\vert S) \over p(w_{i}\vert \neg S)}

This technique of "log-likelihood ratios" is a common technique in statistics. In the case of two mutually exclusive alternatives (such as this example), the conversion of a log-likelihood ratio to a probability takes the form of a sigmoid curve: see logit for details.

Tempo ogé

Bayesian inference (esp. as Bayesian techniques relate to spam)
boosting
fuzzy logic
logistic regression
neural networks
Perceptron
Support vector machine

Sumber sejen

Pedro Domingos and Michael Pazzani. "On the optimality of the simple Bayesian classifier under zero-one loss". Machine Learning, 29:103-130, 1997. (also online at CiteSeer Archived 2005-11-25 di Wayback Machine: [1] Archived 2003-08-14 di Wayback Machine)
Irina Rish. "An empirical study of the naive Bayes classifier". IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence. (available online: PDF Archived 2004-06-13 di Wayback Machine, PostScript)

Tumbu kaluar

Naive Bayesian learning Archived 2003-12-22 di Wayback Machine