Probabilistic Bayesian Networks Inference – A Complete Guide for Beginners!
Previously, we discussed about Bayesian Network Methods, now let’s learn about the Bayesian Networks Inference and various algorithms of structure learning. We will also explore a Naive Bayes case study on fraud detection.
So, let’s start the tutorial.
Probabilistic Bayesian Networks Inference
Use of Bayesian Network (BN) is to estimate the probability that the hypothesis is true based on evidence.
Bayesian Networks Inference:
- Deducing Unobserved Variables
- Parameter Learning
- Structure Learning
Let’s discuss them one by one:
1. Deducing Unobserved Variables
With the help of this network, we can develop a comprehensive model that delineates the relationship between the variables. It is used to answer probabilistic queries about them. We can use it to observe the updated knowledge of the state of a subset of variables. For computing, the posterior distribution of the variables with the given evidence is called probabilistic inference. For detection applications, it gives universal statistics. When anyone wants to select values for the variable subset, it minimizes some expected loss function, for instance, the probability of decision error. A BN is a mechanism for applying Bayes’ theorem to complex problems.
Popular inference methods are:
1.1 Variable Elimination
Variable Elimination eliminates the non-observed non-query variables. It eliminates one by one by distributing the sum over the product.
1.2 Clique Tree Propagation
It caches the computation to query many variables at one time and also to propagate new evidence.
1.3 Recursive Conditioning
Recursive conditioning allows a tradeoff between space and time. It is equivalent to the variable elimination method if sufficient space is available.
Wait! Have you checked – Bayesian Network Tutorial
2. Parameter Learning
To specify the BN and thus represent the joint probability distribution, it is necessary to specify for each node X. Here, the probability distribution for the node X is conditional, based on its parents. There can be many forms of distribution of X. Discrete or Gaussian distributions simplifies calculations. Sometimes constraints on distribution are only known. To determine a single distribution, we can use the principle of maximum entropy. The only one who has the greatest entropy is given the constraints.
Conditional distributions include parameters from data and unknown. Sometimes by using the most likely approach, we can estimate the data. When there are unobserved variables, direct maximization of the likelihood is often complex. EMA refers to Expectation- maximization algorithm. It is for computing expected values of the unobserved variables by performing the maximization of the likelihood with an assumption that the prior expectations are correct. This process converges on most likelihood values for parameters under mild condition.
To treat parameters as additional unobserved variables, Bayesian is an approach. We use BN to compute a posterior distribution conditional upon observed data and then to integrate out the parameters. This approach can be costly and lead to large dimension model. Thus, in real practice, classical parameter-setting are more common approaches.
3. Structure Learning
BN is specified by an expert and after that, it is used to perform inference. The task of defining the network is too complex for humans in other applications. The parameters of the local distributions and the network structure must learn from data in this case.
A challenge pursued that within machine learning is automatically learning the graph structure of a BN. After that, the idea went back to an algorithm developed by Rebane and Pearl (1987). The triplets allowed in a Directed Acyclic Graph (DAG):
- X àY àZ
- X ßYàZ
- X àYßZ
X and Z are independent given Y. Represent the same dependencies by Type 1 and 2, so it is, indistinguishable. We can uniquely identify Type 3. All other pairs are dependent and X and Z are marginally independent. So, while the skeletons of these three triplets are identical, the direction of the arrows is somehow identifiable. When X and Z have common parents, the same distinction applies except that one condition on those parents. We develop the algorithm to determine the skeleton of the underlying graph. After that orient, all arrows whose directionality is estimated by the conditional independencies are observed.
Optimization-based search is an alternative method that is used by structural learning. It needs a scoring function and a search strategy. The posterior probability is a common scoring function of the structure with the given training data. The time for of an exhaustive search returns a structure. It maximizes the score that is super-exponential in the number of variables. We make changes that are incremental in nature in order to improve the overall score. We can do incremental changes through a local search strategy. A global search algorithm like Markov chain can avoid getting trapped in local minima.
Another method consists of focusing on the sub-class of decomposable models. By decomposable model, the MLE has a closed-form.
With nodes and edges using rule-based machine learning techniques, we can augment a BN. To mine rules and create new nodes, Inductive logic programming can be used. Based on the BN structure to guide the structural search and augment the network, we use an approach. The approach is Statistical Relational Learning and it uses a scoring function. A common SRL Scoring Function is the area under the ROC curve.
Become a Machine Learning Expert by completing 40+ tutorials of Machine Learning
Structure Learning Algorithms
You can learn about the structure and parameters of BNs through the structure learning algorithms. It supports both discrete and continuous data sets.
Below are various types of structure learning algorithms:
i. Constraint-based Structure Learning Algorithms
Examples are Grow-Shrink (GS), Incremental Association Markov Blanket, Fast – IAMB, (inter –IAMB)
ii. Score-based Structure Learning Algorithms
Examples are Hill Climbing (HC) and Tabu Search (TC)
iii. Constraint-based Structure Learning Algorithms
Examples are Grow-Shrink (GS), Incremental Association Markov Blanket (IAMB), fast incremental association(Fast – IAMB), interleaved incremental association (inter –IAMB)
iv. Score-based Structure Learning Algorithms
Examples are Hill Climbing (HC) and Tabu Search (TC)
v. Hybrid Structure Learning Algorithms
Examples are Max-Min Hill Climbing (MMHC) and General2-Phase Restricted Maximization (RSMAX2)
vi. Local Discovery Algorithms
Examples are Chow-Liu, ARACNE, max-min parents and children(MMPC) and semi interleaved hiton-PC
vii. Bayesian Network Classifiers
Examples are Naïve Bayes and Tree-Augmented naïve Bayes (TAN)
Fraud Detection – A Naive Bayes Case Study
The advancements in Machine Learning has resulted in a massive boost in automation. One such area is fraud detection. With the help of Machine Learning algorithms like Naive Bayes, it has become much easier for companies to detect fraud at an early stage. They are also able to detect various irregularities in transactions.
In Fraud Detection, companies are able to monitor and analyze user activity to detect any unusual or malicious pattern. With the increment in internet usage, online transactions have resulted in a significant increase in the number of frauds.
With the help of Data Science, industries are able to apply machine learning and predictive modeling to develop tools for the recognition of unusual patterns in the fraud-detection ecosystem. Naive Bayes is one of the important algorithms that is used for fraud detection in the industries.
We have seen the complete concept of Bayesian Network Inference and structure learning algorithms. We also saw a Naive Bayes case study on fraud detection.
Now, it’s the turn of Latest Bayesian Network Applications
Still, if you have any query related to Bayesian Networks Inference then leave a comment in the comment section given below.