# Introduction to Probabilistic Bayesian Networks Inference

## 1. Objective – Bayesian Networks Inference

In our previous tutorial, we have discussed **Bayesian Network Introduction**. Now we are going to describe Bayesian Networks Inference such as Deducing Unobserved Variables, Parameter Learning, Structure Learning. At last, we will also cover various algorithms of structure learning.

So, let’s start the Bayesian Network Inference Tutorial.

## 2. Probabilistic Bayesian Networks Inference

Use of BN is to estimate the probability that the hypothesis is true based on evidence.

Bayesian Networks Inference:

- Deducing Unobserved Variables
- Parameter Learning
- Structure Learning

Let’s discuss these ones by one-

### i. Deducing Unobserved Variables

**BN** stands for **Bayesian Network**. With the help of this network we can develop a comprehensive model that delineates relationship between the variables. It is used to answer probabilistic queries about them. We can use it to observe the updated knowledge of the state of a subset of variables. For computing, the posterior distribution of variables given evidence is called probabilistic inference. For detection applications, it gives universal statistics. When anyone wants to select values for the variable subset. It minimizes some expected loss function, for instance, the probability of decision error. A BN is a mechanism for applying **Bayes’ theorem** to complex problems. Popular inference methods are:

#### a. Variable Elimination

Variable Elimination eliminates the non-observed non-query variables. It eliminates one by one by distributing the sum over the product

#### b. Clique Tree Propagation

To query many variables at one time it caches the computation. To propagate new evidence it also catches the computations.

#### c. Recursive Conditioning

Recursive conditioning allows a tradeoff between space and time. It is equivalent to the variable elimination method if sufficient space is available,

### ii. Parameter Learning

To specify the BN and thus represent the joint probability distribution, it is necessary to specify for each node X. Here, the probability distribution for the node X is conditional, based on its parents. There can be many forms of distribution of X. Discrete or Gaussian distributions simplifies calculations. Sometimes constraints on a distribution are only known. To determine a single distribution we can use the principle of maximum entropy. The only one who has the greatest entropy given the constraints.

Conditional distributions include parameters from data and unknown. Sometimes by using the most likely approach, we can estimate the data. When there are unobserved variables, direct maximization of the likelihood is often complex.** EMA** refers to** Expectation- maximization algorithm**. EMA is for computing expected values of the unobserved variables. By performing the maximization of the likelihood with an assumption that the prior expectations are correct. This process converges on most likelihood values for parameters under mild condition.

To treat parameters as additional unobserved variables, Bayesian is an approach. We use BN to compute a posterior distribution conditional upon observed data. Then to integrate out the parameters. This approach can be costly and lead to large dimension model. Thus, in real practice, classical parameter-setting are more common approaches.

### iii. Structure Learning

BN is specified by an expert and after that, it uses to perform inference. The task of defining the network is too complex for humans in other applications. The parameters of the local distributions and the network structure must learn from data in this case.

A challenge pursued that within **machine learning** is automatically learning the graph structure of a BN. After that, the idea went back to an algorithm developed by **Rebane and Pearl (1987)**. It also goes back the rests on the distinction between the three possible types of adjacent triplets. The triplets allowed in a **Directed Acyclic Graph (DAG)**:

X àY àZ

X ßYàZ

X àYßZ

X and Z are independent given Y. Represent the same dependencies by Type 1 and 2, so it is, indistinguishable. We can uniquely identify Type 3. All other pairs are dependent and X and Z are marginally independent. So, while the skeletons of these three triplets are identical. The direction of the arrows is somehow identifiable. When X and Z have common parents the same distinction applies except that one must first condition on those parents. We develop the algorithm to determine the skeleton of the underlying graph. After that orient, all arrows whose directionality estimates by the conditional independencies observed.

Optimization based search is an alternative method that is used by the structural learning. It needs a scoring function and a search strategy. Posterior probability is a common scoring function of the structure given the training data. The time for of an exhaustive search returning a structure. It maximizes the score is super-exponential in the number of variables. We make changes that are incremental in nature in order to improve the overall score. We can do incremental changes through a local search strategy. A global search algorithm like Markov chain can avoid getting trapped in local minima.

Another method consists of focusing on the sub-class of decomposable models. By decomposable model, the MLE has a closed form.

With nodes and edges using rule-based machine learning techniques, we can augment a BN. To mine rules and create new nodes Inductive logic programming can use. Based on the BN structure to guide the structural search and augment the network we use an approach. The approach is Statistical relational learning and it uses a scoring function. A common SRL scoring function is the area under the ROC curve.

## 3. Structure Learning Algorithms

You can learn about the structure and parameters of BNs. We can learn by using structure learning algorithms. It supports both discrete and continuous data sets.

Below are various types of structure learning algorithms:

**i. Constraint-based structure learning algorithms**

Examples are Grow-Shrink (GS), Incremental Association Markov Blanket, Fast – IAMB, (inter –IAMB)

**ii. Score-based structure learning algorithms**

Examples are Hill Climbing (HC) and Tabu Search (TC)

**iii. Constraint-based structure learning algorithms**

Examples are Grow-Shrink (GS), Incremental Association Markov Blanket (IAMB), fast incremental association(Fast – IAMB), interleaved incremental association (inter –IAMB)

**iv. Score-based structure learning algorithms**

Examples are Hill Climbing (HC) and Tabu Search (TC)

**v. Hybrid structure learning algorithms**

Examples are Max-Min Hill Climbing (MMHC) and General2-Phase Restricted Maximization (RSMAX2)

**vi. Local discovery algorithms**

Examples are Chow-Liu, ARACNE, max-min parents and children(MMPC) and semi interleaved hiton-PC

**vii. Bayesian network classifiers**

Examples are Naïve Bayes and Tree-Augmented naïve Bayes (TAN)

So, this was all about Bayesian Network Inference. Hope you like our explanation.

## 4. Conclusion

Hence, we have seen the complete concept of Bayesian Network Inference. Still, if you have any query related to Bayesian Networks Inference then leave a comment in a section given below.

**See Also-**