ML Project using Bayesian networks


Purpose:  

This project will expose you to inference using Bayesian networks. Bayesian networks capture causal relationships and are widely used in fault diagnosis across a wide variety of applications. A Bayesian network can be represented by a directed graph which will model causal relationships between variables. A useful tool to represent and traverse a graph is NetworkX (NetworkX — NetworkX documentation) which contains a comprehensive library of graph types and graph algorithms written in Python. The application that we will be targeting is Car fault diagnosis which was introduced in class. The fundamental issue in such diagnosis applications is to discover the causes or underlying reasons for the fault to occur and to rank these reasons in terms of their importance.

In this application we will be exploring the reasons behind: a) the car not starting and the probability that this event takes place; and b) under what conditions the car battery becomes flat and the likelihood of this occurring.

Project Requirements:

R1 

Use Networkx in Google Colab and represent the network as shown below:

Attach probability tables to each node as specified in the Project 3 discussion document. Visualize the network using Networkx and show the nodes and edges. You do not have to show the probability tables you created but this of course will be embedded in your code. The coloring used in the figure above does not need to be reproduced. Instead, use a neutral color of blue to shade the nodes. Make sure that your edges show directionality.

R2

Compute the probability P (-cs, +ab, +fb)

R3

Compute the probability P (-cs, +ab)

R4

Compute the probability P(-cs, +fb)

R5

For the battery going flat, which of the factors is more important, battery dead or not charging?

Note: 

  1. Use the starter code provided for this project. This is essential as many of you will not be familiar with NetworkX.
  2. I strongly recommend that you read the Project 3 Discussion Document as it covers not just the probability table creation but also what formulae needs to be used to compute the answers to requirements R2 to R5.