Exact inference in Bayesian networks and applications in forensic statistics

Ivar Simonsson

Exact inference in Bayesian networks and applications in forensic statistics
Doktorsavhandling, 2018

Bayesian networks (BNs) are commonly used when describing and analyzing relationships between interacting variables. Approximate methods for performing calculations on BNs are widely used and well developed. Methods for performing exact calculations on BNs also exist but are not always considered, partly because these methods demand strong restrictions on the structure of the BN. Part of this thesis focuses on developing methods for exact calculations in order make them applicable to larger classes of BNs. More specifically, we study the variable elimination (VE) algorithm, which traditionally can only be applied to finite BNs, Gaussian BNs, and combinations of these two types. We argue that, when implementing the VE algorithm, it is important to properly define a set of factors that represents the conditional probability distributions of the BN in a suitable way. Furthermore, one should strive for defining this factor set in such a way that it is closed under the local operations performed by the algorithm: reduction, multiplication, and marginalization. For situations when this is not possible, we suggest a new version of the VE algorithm, which is recursive and makes use of numerical integration. We exemplify the use of this new version by implementing it on ΓGaussian BNs, i.e., Gaussian BNs in which the precision of Gaussian variables can be modeled with gamma distributed variables.

Bayesian networks are widely used within forensic statistics, especially within familial relationship inference. In this field, one uses DNA data and knowledge about genetic inheritance to make calculations on probabilities of familial relationships. When doing this, one needs not only DNA from the people to be investigated, but also data base information about population allele frequencies. The possibility of mutations makes these calculations harder, and it is important to employ a reasonable mutation model to make the calculations precise. We argue that many existing mutation models alter the population frequencies, which is both a mathematical nuisance and a potential problem when results are interpreted. As a solution to this, we suggest several methods for stabilizing mutation models, i.e., tuning them so that they no longer alter the population frequencies.

forensic statistics

familial relationship inference

Bayesian networks

variable elimination

exact inference

mutation models

Pascal

Opponent: Steffen Lauritzen, Department of Mathematical Sciences, University of Copenhagen, Denmark

Författare

Ivar Simonsson

Chalmers, Matematiska vetenskaper, Tillämpad matematik och statistik

Forskning Andra publikationer

Stationary mutation models

Forensic Science International: Genetics,;Vol. 23(2016)p. 217-225

Artikel i vetenskaplig tidskrift

Exact Inference on Conditional Linear Γ-Gaussian Bayesian Networks.

Journal of Machine Learning Research,;Vol. 52(2016)p. 474-486

Artikel i vetenskaplig tidskrift

Simonsson, I., Mostad, P., A new algorithm for inference in some mixed Bayesian networks with exponential family distributions

Grafiska modeller i allmänhet blir ett mer och mer populärt sätt att beskriva och analysera komplexa interaktioner hos data. I synnerhet är Bayesianska nätverk ett oumbärligt verktyg när beroendet mellan interagerande variabler ska analyseras. Metoder för att utföra approximativa beräkningar på Bayesianska nätverk utvecklas och förfinas ständigt men denna avhandling lägger större fokus på exakta beräkningar på Bayesianska nätverk. Den så kallade variabelelimineringsalgoritmen är ett populärt verktyg för att utföra sådana exakta beräkningar. Ett stort problem med denna algoritm är att den endast går att applicera på små klasser av Bayesianska nätverk. Delar av denna avhandling är tillägnad åt vidareutveckling av algoritmen i syfte att öka dess användningsområde.

Forensisk statistik är ett stort användningsområde för Bayesianska nätverk, inte minst när DNA-data används för att utföra släktskapsanalyser. Med hjälp av DNA-data och kunskap om hur gener ärvs från förälder till avkomma kan beräkningar utföras för att uppskatta sannolikheten att två personer är släkt på ett specifikt vis. Sådana beräkningar är inte problemfria och det finns många försvårande omständigheter som måste tas hänsyn till. Ett fenomen som försvårar släktskapsberäkningar är mutationer. För att göra dessa beräkningar någorlunda precisa måste mutationsprocessen modelleras på ett rimligt sätt. I denna avhandling argumenterar vi för att vissa vanligt förekommande mutationsmodeller för med sig beräkningsproblem och vi föreslår en metod för att åtgärda dessa problem.

Ämneskategorier (SSIF 2011)

Sannolikhetsteori och statistik

ISBN

978-91-7597-818-5

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 4499

Utgivare

Chalmers