Skip to content

Inference Rules Demystified - Navigating the Logic Behind AI

Published: at 16:21

We’re now going to explore another method of performing inference, this time through something known as inference rules. We’ll delve into how to use these rules to transform any sentence into what’s called conjunctive normal form (CNF) and then apply inference by resolution. In the final part of the post, we’ll venture into another realm of logic by examining first-order logic and its applications. Let’s dive in!

An illustration of a man going insane with rules.

I want to gain a better understanding of how AI actually works. As a first step, I’ve enrolled in the HarvardX: CS50’s Introduction to Artificial Intelligence with Python course on EdX. To enhance my learning, I will be applying the techniques of Nobel Prize winner Richard Feynman and try to teach the concepts I’m learning while I’m learning them. That is what these blog posts will be about. You can find the whole CS50 AI-course straight from FreeCodeCamp’s Youtube channel or on EdX.

Table of contents

Open Table of contents

Inference rules

Inference rules serve as the logical frameworks enabling artificial intelligence (AI) systems to derive conclusions from existing information. However, to fully grasp their function, we first need to look at the structure of how they are represented.

Premise

Conclusion

Inference rules are depicted with a horizontal line: anything above this line represents our premises, the information we know to be true. Anything below the line signifies the conclusion we can deduce by applying the logic of the inference rule.

Consider the following example:

If it is raining then Harry is inside.
It is raining.

Harry is inside.

We understand from the premises that Harry will be inside if it rains, and given that it is indeed raining, we can logically conclude that Harry is inside.

Modus ponens

The inference rule demonstrated above is known as Modus Ponens, and it can be formalized as follows:

⍺ → β

β

If we know that ⍺ implies β, and we know ⍺ to be true. Then β must also be true.

This logical process takes us on a path distinct from model checking, where we explored various possibilities. In this case, we focus directly on the knowledge at hand.

While the rule and the example might seem straightforward, the power lies in combining this with other inference rules. It’s the synthesis of these rules that will enable us to generate knowledge within our AI systems.

And Elimination

Next, we’ll explore various inference rules and their applications, starting with a familiar example (Harry Potter is often used in the CS50 course):

Harry is friends with Ron and Hermione

Harry is friends with Hermione

This rule is called And Eliminationen and can be formalized as follows:

⍺ ∧ β

When we know that both ⍺ and β are true, it logically follows that ⍺ is also true.

While this may seem self-evident to us, it’s crucial to explicitly define such logic for a computer. We must inform the system that it can apply this rule to deduce information from given premises, allowing it to transform and process knowledge.

Double negation elimination

This rule allows us to simplify statements by removing double negations:

¬(¬⍺)

Simply put, if “not not ⍺” is true, then ⍺ must be true.

It might sound like a linguistic puzzle, but here’s the logic: ¬⍺ indicates that ⍺ is not true, so ¬(¬⍺) effectively cancels out the negations, confirming that ⍺ is indeed true. Recall from the last post that ¬ is the logical symbol for “not.”

Implication elimination

Implication Elimination allows us to convert implications (if-then statements) into or statements:

⍺ → β

¬⍺ ∨ β

If ⍺ implies β, then not ⍺ or β is true. We can try and reason about this. But basically if ⍺ is false and ⍺ implies β, then β is false. β has two states true or false, then we can conclude that ⍺ is false or β is true.

Biconditional elimination

This rule enables the simplification of biconditionals:

⍺ ↔︎ β

(⍺ → β) ∧ (β → ⍺)

A biconditional statement means ”⍺ if and only if β,” requiring both ⍺ and β to share the same truth value. This mutual dependency means they imply each other. Thus, we can represent the biconditional as both ”⍺ implies β” and “β implies ⍺.”

De Morgan’s Laws

Named after Augustus De Morgan, De Morgan’s Laws enable the transformation of conjunctions (statements joined by ‘and’) into disjunctions (statements joined by ‘or’), and vice versa, through the application of negation.

¬(⍺ ∧ β)

¬⍺ ∨ ¬β

This rule implies that if ⍺ and β together are not true, then either ⍺ is not true, β is not true, or both are not true. Conceptually, this involves moving the negation inside and converting the “and” (∧) into an “or” (∨).

The principle also applies in reverse:

¬(⍺ ∨ β)

¬⍺ ∧ ¬β

Here, if neither ⍺ nor β is true, then both must be not true. We move the negation in to the sentence and flip the connective.

The distributive property

The distributive property in logic functions similarly to multiplication over addition in mathematics, allowing for the distribution of conjunctions and disjunctions across each other.

(⍺ ∧ (β ∨ 𝛄))

(⍺ ∧ β) ∨ (⍺ ∧ 𝛄)

This shows that if ⍺ is conjoined with either β or 𝛄, it can be distributed to form ”⍺ and β” or ”⍺ and 𝛄.” The property also applies in reverse, for a disjunction distributed over a conjunction:

(⍺ ∨ (β ∧ 𝛄))

(⍺ ∨ β) ∧ (⍺ ∨ 𝛄)

This means that if ⍺ is disjoined with both β and 𝛄, it can be distributed to form ”⍺ or β” and ”⍺ or 𝛄.”

How do we use the rules?

We’ve explored a variety of inference rules, but their practical application might still seem elusive. How do we use these rules to draw conclusions? As before we want to prove something about entailment, we want to take our initial knowledge base and prove that some query is true.

One way to think about it is to think back to search problems. In search problems we have:

Using inference rules we could take some sentences in propositional logic and create new sets of sentences. We can treat those sentences as states inside a search problem. So if we want to prove that some query or logical theorem is true, we can treat theorem proving as a form of search problem.

Translated to a search problem the different components could look like this:

However, this is just one approach among many. Next, we will delve into resolution, one of the most prevalent methods for applying these concepts.

Resolution

Resolution is based on another inference rule. A powerful rule that will let us prove anything that can be proven about a knowledge base.

Unit resolution

Unit resolution is a straightforward yet powerful rule:

P ∨ Q
¬P

Q

If P or Q is true, and not P, then Q must be true.

Q actually does’t need to be a single propositional symbol it can be multiple symbols chained together.

P ∨ Q1 ∨ Q2 ∨ … ∨ Qn
¬P

Q1 ∨ Q2 ∨ … ∨ Qn

We can generalize this idea even further:

P ∨ Q
¬P ∨ R

Q ∨ R

Here, if P or Q is true and not P or R is true, then Q or R must also be true, effectively cancelling out P.

This logic extends to clauses with multiple symbols:

P ∨ Q1 ∨ Q2 ∨ … ∨ Qn
¬P ∨ R1 ∨ R2 ∨ … ∨ Rm

P ∨ Q1 ∨ Q2 ∨ … ∨ Qn ∨ R1 ∨ R2 ∨ … ∨ Rm

In this context, each line represents a clause, which is a disjunction of literals (either a propositional symbol or its negation), such as P ∨ Q ∨ R

With these principles, we can restructure any logical statement into conjunctive normal form (CNF), laying the groundwork for advanced theorem proving through resolution.

Conjunctive normal form

Conjunctive Normal Form (CNF) is a logical sentence structured as a conjunction (connected with ‘and’) of clauses.

Example: (A ∨ B ∨ C) ∧ (D v ¬E) ∧ (F v G)

This standardized form allows us to more easily work with logical sentences by transforming them into CNF, making any sentence in logic manageable through the application of specific inference rules and transformations.

To convert a sentence into CNF, we follow these standardized steps, systematically modifying the sentence to align more closely with CNF structure:

  1. Eliminate biconditionals using the biconditional elimination rule: (⍺ ↔︎ β) into (⍺ → β) ∧ (β → ⍺)
  2. Second we eliminate implications. We can use the rule we know: (⍺ → β) into ¬⍺ ∨ β
  3. Third we move any ¬ inwards using De Morgan’s Laws: ¬(⍺ ∧ β) into ¬⍺ ∨ ¬β
  4. And then we use the distributive law to distribute ‘or’(∨) as needed.

Let’s walk through a conversion example:

(P ∨ Q) → R

Let’s first remove the implication, using the implication elimination rule.

¬(P ∨ Q) ∨ R

Next, we apply De Morgan’s law to move the negation inwards:

(¬P ∧ ¬Q) ∨ R

Finally, we distribute the ‘or’ (∨) inward using the distributive law:

(¬P ∨ R) ∧ (¬Q ∨ R), achieving a sentence in conjunctive normal form.

Why do we even want to do this? Because when we have the sentences in conjunctive normal form, the clauses are the inputs to the resolution inference rule we looked at earlier.

Inference by resolution

With the resolution rule, we can perform inference. Recall the basic form:

P ∨ Q
¬P ∨ R

(Q ∨ R)

Before proceeding, let’s consider additional rules for clarity:

P ∨ Q ∨ S
¬P ∨ R ∨ S

(Q ∨ R ∨ S) we don’t need to list the S twice.

And crucially:

P
¬P

() The empty clause.

The empty clause will always evaluates to false since P and ¬P cannot simultaneously be true.

This concept of resolving contradictory terms to produce an empty clause forms the foundation of our inference by resolution algorithm.

This is what we will do on a high level:

Here is how we are going to do it in a little more detail.

So let’s do it with an actual example.

Does the KB (A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) entail the query A?

We want to prove by contradiction so we will assume that A is false.

(A ∨ B) ∧ (¬B ∨ C) ∧ (¬C) ∧ (¬A)

We now have four clauses we can try and resolve. We start with the two clauses containing C.

(A ∨ B) ∧ (¬B ∨ C)(¬C) ∧ (¬A)

In these two clauses there are complimentary literals C and ¬C. The unit resolution rules says we can cancel the C:s out and we are left with the clause (¬B)

Again this new clause is a complimentary clause with the B in the first clause.

(A ∨ B) ∧ (¬A) ∧ (¬B)

After that we are left with the clause A:

(A) ∧ (¬A)

When we resolve these two, we only have the empty clause left. Which is false. Because we were able to produce the empty clause this means that ¬A is false, which in turn means our knowledge base entails A.

This is an entirely different way compared to model checking. Instead of enumerating all the possible worlds, we use this resolution algorithm and try and see if we can reach a contradiction. And if we reach a contradiction, and that tells us something about if our knowledge contains the query or not.

First-order logic

In the final segment of this post, we pivot from inference by resolution to explore a distinct realm of logic. Until now, our discussions have revolved around propositional logic, where individual symbols are interconnected through logical connectives. However, propositional logic, for all its utility, has its constraints—for instance, representing complex ideas might require an unwieldy number of symbols.

We turn our attention to first-order logic (FOL), a more expressive form of logic that introduces two types of symbols: constant symbols, representing objects, and predicate symbols, representing relations and functions.

Consider an example related to the Harry Potter universe, aiming to encapsulate knowledge about characters and their respective houses. We could categorize the symbols as follows:

Constant Symbol
Minerva
Pomona
Horace
Gilderoy
Gryffindor
Hufflepuff
Ravenclaw
Slytherin
Predicate Symbol
Person
House
BelongsTo

In this framework, predicates are properties that may or may not apply to constant symbols. For instance, the predicate ‘Person’ would not apply to a ‘House’. ‘BelongsTo’ serves as a relational predicate, linking characters to their houses.

Here’s how sentences in first-order logic could be structured:

SentencesExplanation
Person(Minerva)Mineva is a person
House(Gryffindor)Gryffindor is a house
¬House(Minerva)Minerva is not a house
BelongsTo(Minerva, Gryffindor)Minerva belongs to Gryffindor

This example demonstrates that first-order logic is notably more expressive, requiring fewer symbols compared to propositional logic.

First-order logic gives us some additional features that enable us express some more complex ideas. These are generally known as quantifiers. There are two main quantifiers in first-order logic, universal quantification and existential quantification.

Universal quantification

Universal quantification allows us to declare that a specific property or relation holds true for all instances within a certain domain. It introduces the symbol ∀, which stands for “for all.”

∀x. BelongsTo(x, Gryffindor) → ¬BelongsTo(x, Hufflepuff)

For all objects x, if x belongs to Gryffindor then x does not belong to Hufflepuff

Or in simple words: Anyone in Gryffindor is not in Hufflepuff.

Existential quantification

Conversely, existential quantification posits that a statement is true for at least one instance within a domain. It uses the symbol ∃, signifying “there exists.”

∃x. House(x) ∧ BelongsTo(Minerva, x)

There exists an object x such that x is a house and Minerva belongs to x.

Or in simpler words: Minerva belongs to a house.

With these tools we can create far more sophisticated logical statements than we could with propositional logic. We could create something like this:

∀x. Person(x) → (∃y. House(y) ∧ BelongsTo(x, y))

For all objects x, if x is a person, then there exists and object y such that y is a house and x belongs to y.

Simple: Every person belongs to a house.

This introduction to first-order logic merely scratches the surface of the diverse landscape of logic used to represent knowledge. Our goal is for AI agents to not only possess information but also to effectively represent and reason with that information through inference algorithms or model checking.

In an upcoming post, we’ll delve into enhancing AI’s capabilities by incorporating uncertainty. We’ll explore how AI systems handle probabilistic knowledge, aiming to enrich the intelligence of these systems further. Stay tuned as we continue our journey towards learning more about how sophisticated intelligent systems are created.

Previous posts about CS50AI:

  1. The basics of search - Depth first vs breadth first search
  2. Informed search algorithms - Greedy best-first search and A* search
  3. Adversarial search - Playing Tic Tac Toe with the Minimax algorithm
  4. Inference with Model Checking - How to Give AI Knowledge