A General Guide to AI & Data Mining
This post is a short summary of first half of the course, ‘IE4483: AI and Data Mining’ at NTU, Singapore. The course is taught by Prof. Dr. Lihui Chen.
Introduction: AI & Search
The two most fundamental concerns of AI researchers:

Knowledge representation

Capturing knowledge in a way suitable for computer manipulation

Graphs (Powerful representation for many problems)
 Example: Euler graph, Konigsberg bridge problem

ANN  Network + weights


Search
 Systematically exploring the space of possible solutions to a problem
Example: The Bridges of Konigsberg Problem
Is there a walk around the city that crosses each bridge exactly once?
The degree of a node of a graph: An even degree node has an even number of arcs joining it neghbouring nodes. An odd degree node has an odd number of arcs joining it to neighbouring nodes.
Euler’s conclusion: A walk around the city that crosses each bridge exactly once is possible if and only if the graph contains exactly 0 or 2 nodes of odd degree.
Another example: Travelling Salesman Problem (TSP)
Search Strategies
A strategy is defined by picking the order of node expansion in a graph.
Strategies are evaluated along the following dimensions:
 Completeness: Does the strategy always find a solution if one exists?
 Optimality: Does the strategy always find the best solution if one exists?
 Time complexity: How much time does the strategy take to find a solution?
 Space complexity: How much memory does the strategy require to find a solution?
Uninformed search = blind search
Informed search = heuristicapplied search
Strategies for State Space Search

Datadriven search: Facts & rules => New facts > … => Goal
 This approach is sometimes called forward chaining.
 Has a branch factor
 eg: Playing go, chess

Goaldriven search: Goal => Sub goals > … => Facts
 This approach is sometimes called backward chaining.
 eg: Medical diagnostic
How to systematically search a graph?
Why a graph? Very close to how we make decisions in real life.
 We use backtracking to systematically search a graph.
 Depthfirst search (DFS) for constraint satisfaction problem (CSP)
Notations:
 CS = Current state
 SL = State list
 NSL = Next state list (Unprocesses states)
 DE = Dead ends
Breadthfirst search (BFS)
 Level order traversal of a graph
If the branching factor, B (average number of children), is large, the combinatorics may prevent the algorithm from finding a solution using the available memory.
Depthfirst search (DFS)
 Depthfirst traversal of a graph
DFS w/ Iterative deepening search (IDS)

First performs a dfs on the space with a depth bound of one

If it fails to find a goal, it performs another dfs with a depth bound of two. We increase the depth bound by one each time.

At each iteration, the algorithm performs a complete dfs to the current depth bound. NO info about the state space is retained between iterations.
Uninformed vs Informed Search Strategies
Uninformed strategies use only information available in the problem definition.
 Systematically generate new states
 BFS, DFS, IDS
Criteria  BFS  DFS  IDS 

Completeness  Yes  No  Yes 
Optimality  Yes  No  Yes 
Time complexity  O(b^d)  O(b^m)  O(b^d) 
Space complexity  O(b^d)  O(bm)  O(bd) 
Informed strategies use problemspecific knowledge to guide the search.
 Use an evaluation function to estimate the cost of a state
 A* search, Hill climbing, Bestfirst (Greedy)
Introduction: Heuristic search
heuristic (adj.) means based on learning from personal discoveries and experiences.
 Formulated as rules for choosing those branches in a state space that are most likely to lead to a solution.
This increases the feasibility of the search process.
Limitations
 A heuristic is only an informed guess.
 Can lead a search algorithm to suboptimal solution or fail to find any solution at all.
Two key components of a heuristic search
 A heuristic function, h(n), that estimates the cost of the cheapest path from the state n to a goal state.
 A search strategy that uses h(n) to guide the search.
Divising heuristics
THe evaluation function, f(n), the sum of two components: $$f(n) = g(n)+h(n)$$
 $g(n)$: pathcost function = cost from initial state to current state n.
 $h(n)$: heuristic function = estimated cost of the cheapest path from n to a goal state.
 If $h(n)$ is not larger than the real cost, the search is admissible.
Introduction to Data Mining and Association Analysis
 Data  Facts, numbers, or text that can be processed by a computer.
 Information  Patterns, associations, relationships, or trends that can be extracted from data.
 Knolwedge  Information can be converted into knowledge about historical patterns and future trends.
Data mining  Processes of extracting patterns or knowledge from large amounts of data. Application areas include:
 Market or business analysis and decision support
 Text mining
 Medical and biological data analysis
 Fraud detection
 Social network analysis
Data Mining Process
Input Data > Data Preprocessing > Data Mining > Post processing > Knowledge Discovery > Knowledge Representation > Knowledge Evaluation
Data Mining Tasks
 Predictive
 Classification
 Regression
 Timeseries analysis
 Prediction
 Descriptive
 Association rules
 Clustering
 Summarization
 Sequence discovery
Data Mining Architecture
Association Analysis
 Association analysis  Search for relationships between items in a dataset.
 Finding frequent patterns, correlations, or causal structures among variables in large databases.
 Frequent pattern: A pattern that occurs frequently in a dataset.
Rule form: $X \rightarrow Y$
Basic Concepts

I = {i1, i2, …, in} is a set of items.

T = {t1, t2, …, tm} is a set of transactions.

Each transaction contains a subset of items.

Support count $\sigma(X)$: The number of transactions that contain a given itemset.

Support $\sigma(X)/T$: The fraction of transactions that contain a given itemset.
 How often a rule is applicable in a given dataset.
Association rule is an implication of the form $X \rightarrow Y$ where $X$ and $Y$ are itemsets; $X \cap Y = \emptyset$.
 Strength of association: Measured in terms of support and confidence.
 Support: $\sigma(X \cup Y)/T$  Applicability of the rule in the dataset.
 Confidence: $\sigma(X \cup Y)/\sigma(X)$  How frequently items in $Y$ appear in transactions containing $X$. (Conditional probability)
Frequent Itemset Generation
Apriori Algorithm
Write about maximal and closed frequent itemsets.
 Find the minimum support count, $minsup=T \times \text{% of min support}$.
 Let k = 1
 Repeat until no more frequent itemsets are found:
 Generate frequency count of all itemsets of size k.
 Prune the itemsets that do not meet the minimum support count.
 Prune itemsets which has subsets that are not frequent.
 k = k + 1
FPGrowth Algorithm
TO BE CONTINUED
 FPGrowth compresses the dataset by representing frequent items into a FPtree (frequent pattern tree).
 FPTree
 Conditional pattern base
 COnditional FPtree for item_i
Association Rules Evaluation
Lift is a simple correlation measure between two itemsets $X$ and $Y$.
$$ \begin{aligned} \text{Lift}(X,Y) &= \frac{\text{Confidence}(X\rightarrow Y)}{\text{Support}(Y)} \ &= \frac{\frac{\sigma(X \cup Y)}{\sigma(X)}}{\frac{\sigma(Y)}{T}} \ &= \frac{P(X\cup Y)}{P(X)P(Y)} \end{aligned} $$

Lift = 1: No association between X and Y.

Lift > 1: Positive correlated between X and Y.

Lift < 1: Negative correlated between X and Y.

Some rules are strong associations; some may still not be interesting