A General Guide to AI & Data Mining

This post is a short summary of first half of the course, ‘IE4483: AI and Data Mining’ at NTU, Singapore. The course is taught by Prof. Dr. Lihui Chen.

The two most fundamental concerns of AI researchers:

  1. Knowledge representation

    • Capturing knowledge in a way suitable for computer manipulation

    • Graphs (Powerful representation for many problems)

      • Example: Euler graph, Konigsberg bridge problem
    • ANN - Network + weights

  2. Search

    • Systematically exploring the space of possible solutions to a problem

Example: The Bridges of Konigsberg Problem

Visual depiction of the konigsberg problem
Visual depiction of the konigsberg problem

Is there a walk around the city that crosses each bridge exactly once?

No universally superior search algorithms for all problems, also known as the “No-Free-Lunch Theorem.”

The degree of a node of a graph: An even degree node has an even number of arcs joining it neghbouring nodes. An odd degree node has an odd number of arcs joining it to neighbouring nodes.

Euler’s conclusion: A walk around the city that crosses each bridge exactly once is possible if and only if the graph contains exactly 0 or 2 nodes of odd degree.

Another example: Travelling Salesman Problem (TSP)

Search Strategies

Search strategies
Search strategies

A strategy is defined by picking the order of node expansion in a graph.

Strategies are evaluated along the following dimensions:

  • Completeness: Does the strategy always find a solution if one exists?
  • Optimality: Does the strategy always find the best solution if one exists?
  • Time complexity: How much time does the strategy take to find a solution?
  • Space complexity: How much memory does the strategy require to find a solution?

Un-informed search = blind search
Informed search = heuristic-applied search

  • Data-driven search: Facts & rules => New facts -> … => Goal

    • This approach is sometimes called forward chaining.
    • Has a branch factor
    • eg: Playing go, chess
  • Goal-driven search: Goal => Sub goals -> … => Facts

    • This approach is sometimes called backward chaining.
    • eg: Medical diagnostic

How to systematically search a graph?

Why a graph? Very close to how we make decisions in real life.

  • We use backtracking to systematically search a graph.
    • Depth-first search (DFS) for constraint satisfaction problem (CSP)


  • CS = Current state
  • SL = State list
  • NSL = Next state list (Unprocesses states)
  • DE = Dead ends

Breadth-first search (BFS)

  • Level order traversal of a graph

If the branching factor, B (average number of children), is large, the combinatorics may prevent the algorithm from finding a solution using the available memory.

Depth-first search (DFS)

  • Depth-first traversal of a graph

DFS w/ Iterative deepening search (IDS)

  • First performs a dfs on the space with a depth bound of one

  • If it fails to find a goal, it performs another dfs with a depth bound of two. We increase the depth bound by one each time.

  • At each iteration, the algorithm performs a complete dfs to the current depth bound. NO info about the state space is retained between iterations.

Uninformed vs Informed Search Strategies

Uninformed strategies use only information available in the problem definition.

  • Systematically generate new states
Criteria BFS DFS IDS
Completeness Yes No Yes
Optimality Yes No Yes
Time complexity O(b^d) O(b^m) O(b^d)
Space complexity O(b^d) O(bm) O(bd)

Informed strategies use problem-specific knowledge to guide the search.

  • Use an evaluation function to estimate the cost of a state
  • A* search, Hill climbing, Best-first (Greedy)

heuristic (adj.) means based on learning from personal discoveries and experiences.

  • Formulated as rules for choosing those branches in a state space that are most likely to lead to a solution.

This increases the feasibility of the search process.


  1. A heuristic is only an informed guess.
  2. Can lead a search algorithm to sub-optimal solution or fail to find any solution at all.
  1. A heuristic function, h(n), that estimates the cost of the cheapest path from the state n to a goal state.
  2. A search strategy that uses h(n) to guide the search.

Divising heuristics

THe evaluation function, f(n), the sum of two components: $$f(n) = g(n)+h(n)$$

  • $g(n)$: path-cost function = cost from initial state to current state n.
  • $h(n)$: heuristic function = estimated cost of the cheapest path from n to a goal state.
    • If $h(n)$ is not larger than the real cost, the search is admissible.

Introduction to Data Mining and Association Analysis

  • Data - Facts, numbers, or text that can be processed by a computer.
  • Information - Patterns, associations, relationships, or trends that can be extracted from data.
  • Knolwedge - Information can be converted into knowledge about historical patterns and future trends.

Data mining - Processes of extracting patterns or knowledge from large amounts of data. Application areas include:

  • Market or business analysis and decision support
  • Text mining
  • Medical and biological data analysis
  • Fraud detection
  • Social network analysis

Data Mining Process

Input Data -> Data Preprocessing -> Data Mining -> Post processing -> Knowledge Discovery -> Knowledge Representation -> Knowledge Evaluation

Data Mining Tasks

  1. Predictive
    • Classification
    • Regression
    • Time-series analysis
    • Prediction
  2. Descriptive
    • Association rules
    • Clustering
    • Summarization
    • Sequence discovery

Data Mining Architecture

Data Mining Architecture
Data Mining Architecture

Association Analysis

  • Association analysis - Search for relationships between items in a dataset.
    • Finding frequent patterns, correlations, or causal structures among variables in large databases.
    • Frequent pattern: A pattern that occurs frequently in a dataset.

Rule form: $X \rightarrow Y$

Basic Concepts

  • I = {i1, i2, …, in} is a set of items.

  • T = {t1, t2, …, tm} is a set of transactions.

  • Each transaction contains a subset of items.

  • Support count $\sigma(X)$: The number of transactions that contain a given itemset.

  • Support $\sigma(X)/|T|$: The fraction of transactions that contain a given itemset.

    • How often a rule is applicable in a given dataset.

Association rule is an implication of the form $X \rightarrow Y$ where $X$ and $Y$ are itemsets; $X \cap Y = \emptyset$.

  • Strength of association: Measured in terms of support and confidence.
    • Support: $\sigma(X \cup Y)/|T|$ - Applicability of the rule in the dataset.
    • Confidence: $\sigma(X \cup Y)/\sigma(X)$ - How frequently items in $Y$ appear in transactions containing $X$. (Conditional probability)

Frequent Itemset Generation

Apriori Algorithm

Write about maximal and closed frequent itemsets.

  1. Find the minimum support count, $minsup=|T| \times \text{% of min support}$.
  2. Let k = 1
  3. Repeat until no more frequent itemsets are found:
    1. Generate frequency count of all itemsets of size k.
    2. Prune the itemsets that do not meet the minimum support count.
      • Prune itemsets which has subsets that are not frequent.
    3. k = k + 1

FP-Growth Algorithm


  • FP-Growth compresses the dataset by representing frequent items into a FP-tree (frequent pattern tree).
  • FP-Tree
  • Conditional pattern base
  • COnditional FP-tree for item_i

Association Rules Evaluation

Lift is a simple correlation measure between two itemsets $X$ and $Y$.

$$ \begin{aligned} \text{Lift}(X,Y) &= \frac{\text{Confidence}(X\rightarrow Y)}{\text{Support}(Y)} \ &= \frac{\frac{\sigma(X \cup Y)}{\sigma(X)}}{\frac{\sigma(Y)}{|T|}} \ &= \frac{P(X\cup Y)}{P(X)P(Y)} \end{aligned} $$

  • Lift = 1: No association between X and Y.

  • Lift > 1: Positive correlated between X and Y.

  • Lift < 1: Negative correlated between X and Y.

  • Some rules are strong associations; some may still not be interesting