Context-Free Grammars

Context-Free Grammar

Layman Explanation

Suppose the following CFG:

S \to a S b \\~\\ S \to \epsilon

S can render/derive/transform into aSb or \epsilon

This is the parse tree of S:

Doodle 17

Why: S \to a S b \to aa S bb \to aaa S bbb \to aaa \epsilon bbb

In other words, this grammar can render anything in { a^n b^n }, which is the language the grammar generates.

Formal Definition

Formal Definition of CFG: G: (V, \Sigma, R, S)

If u, v, and w are strings of variables and terminals, and A \to w is a rule of the grammar, we say that uAv yields uwv, written uAV \Rightarrow uwv

Say that u derives v, written u \Rightarrow v, if u = v or if a sequence u_1,u_2,...u_k exists for k \ge 0 and

u \Rightarrow u_1 \Rightarrow u_2 \Rightarrow ... \Rightarrow u_k \Rightarrow v

Language of the Grammar: { w \in \Sigma^* | S \overset{*}{\Rightarrow} w }

Q: What does the following CFG render?

S \to a S \\ S \to S b \\ S \to \epsilon

A:

This example doesn’t enforce the # of as to be equal to bs (not {a^n b^n})

Renders: \epsilon, a, b, ab, aa, bb, ...

Thus: L(G) = { a^* b^* }

Q: Define a CFG for {a^{3n}}

A:

We want to generate the terminal in threes at a time, so:

S \to S aaa \\ S \to \epsilon

shorter: S \to S a^3 | \epsilon

Designing CFGs

Linked Strings

If your language has two substrings that are “linked”:

R \to uRv

Linked Substrings?: When the machine needs to remember an unbounded amount of information to count the substrings to satisfy some relationship.

Ambiguity

Ambiguity: A string w is derived ambiguously by a CFG (G) if it has \ge 2 leftmost derivations.

Sometimes a grammar can generate the same string in many ways.

Example: (EXPR) \to (EXPR) + (EXPR) | (EXPR)\times(EXPR) | (EXPR) | a

Q: How to avoid ambiguity?

A: Separate by using different variables

e.g., this is an unambiguous grammar \begin{align*} (EXPR) &\to (EXPR) + (TERM) | (TERM) \\ (TERM) &\to (TERM) \times (FACTOR) | (FACTOR) \\ (FACTOR) &\to (EXPR) | a \end{align*}

Inherently Ambiguous: Some languages can only be generated by ambiguous grammars.

Conversion Examples

NFA to CFG

Q: Convert the NFA to CFG

A: Q1 \to a Q2 \\ Q2 \to a Q3 \\ Q3 \to a Q1 \\ Q1 \to \epsilon

Q: Convert the NFA to CFG

A: S \to aA \\ A \to aS | bA | F \\ F \to aA | bF | \epsilon

Language to CFG

Remember: Divide and conquer! (concatenate and union)

Q: { a^n ( b^3 )^n }

Q: { a^n ( bcd )^n }

Q: { a^n b^n | n <= 3 }

Q: { a^n b^n | n >= 2 }

Q: { w | w has n a’s and n b’s }

Q: { w | w has m a’s and n b’s and m+n = 2k for some integer k >= 0 }

Chomsky Normal Form (CNF)

A CFG is in Chomsky normal form if every rule is of the form:

\boxed{ \text{CNF: } \begin{cases} A &\to BC \\ A &\to a \end{cases} }

Converting to CNF

Steps:

  1. Start variable on left only.
  2. Only S_0 can (optionally) go to \epsilon.
  3. Ensure variables on RHS show up in pairs (A \to BC)
  4. Ensure single terminal on RHS (A \to a)

More On Rule 4:

Convention for Replacing Terminals: Suppose A \to bc BA

By convention, we can use T to organize terminals, e.g.,

A \to T_b T_c BA \\ T_b \to b \\ T_c \to c \\

Convention for Replacing Variables: Suppose A \to T_b T_c BA

By convention, we can use V to organize variables, e.g.,

A \to V_{BC} V_{BA} \\ V_{BC} \to T_b T_c \\ V_{BA} \to B A \\

Note: If something is in CNF, don’t waste time modifying it!

Purpose of CNF:

Look at this language and its empty productions.

S \to A \\ A \to B \\ B \to \epsilon

What about this loop of empty productions.

S \to A | \epsilon \\ A \to B \\ B \to S

Chomsky normal form is how we modify the grammar to eliminate such situations.

Example: Binary tree parse tree of CNF S \to AB | \epsilon \\ A \to a \\ B \to b \\

Example: CFG to CNF

\begin{aligned} S &\to ASA | aB \\ A &\to B|S \\ B &\to b|\epsilon \end{aligned}

  1. Add new start variable (S_0)

\begin{aligned} \textcolor{green}{S_0} &\textcolor{green}{\to S} \\ S &\to ASA | aB \\ A &\to B|S \\ B &\to b|\epsilon \end{aligned}

Observations:

  1. Remove all epsilon transitions by a non-start variable.
Sub-Example: Removing an Epsilon Transitions

Suppose:

\begin{aligned} S \to BAB \\ B \to \epsilon \\ B \to b \\ A \to a \end{aligned}

We can move the epsilon into S, like so:

\begin{aligned} S \to B A B \\ S \to \epsilon A B \\ S \to B A \epsilon \\ S \to \epsilon A \epsilon \\ B \to b \\ A \to a \end{aligned}

As we can see, this generated four rules, as we had to account for every possibility of B in the right hand side going to epsilon, or not.

Let’s simplify by removing the useless epsilons.

\begin{aligned} S \to B A B \\ S \to A B \\ S \to B A \\ S \to A \\ B \to b \\ A \to a \end{aligned}

Or, as one line:

\begin{aligned} S \to B A B | A B | B A | A \\ B \to b \\ A \to a \end{aligned}

If you had something like S \to BABAB, the possibilities would be quite alot: S \to BABAB | ABAB | BAAB | BABA | AAB | BAA | ABA | AA

Removing B \to \epsilon

\begin{aligned} S_0 &\to S \\ S &\to ASA | aB | \textcolor{green}{a} \\ A &\to B|S|\textcolor{green}{\epsilon} \\ B &\to b \end{aligned}

Now, we have to remove A \to \epsilon

\begin{aligned} S_0 &\to S \\ S &\to ASA | aB | a | \textcolor{green}{ SA | AS | S } \\ A &\to B|S \\ B &\to b \end{aligned}

Remember: We can’t remove epsilon from start variable, as that’d change the language.

  1. Remove unit rules

We can delete the useless S \to S.

\begin{aligned} S_0 &\to S \\ S &\to ASA | aB | a | SA | AS \\ A &\to B|S \\ B &\to b \end{aligned}

Removing useless S_0 \to S

\begin{aligned} S_0 &\to \textcolor{green}{ASA | aB | a | SA | AS} \\ S &\to ASA | aB | a | SA | AS \\ A &\to B|S \\ B &\to b \end{aligned}

Remove A \to B

\begin{aligned} S_0 &\to ASA | aB | a | SA | AS \\ S &\to ASA | aB | a | SA | AS \\ A &\to S | \textcolor{green}{b} \\ B &\to b \end{aligned}

Remove A \to S

\begin{aligned} S_0 &\to ASA | aB | a | SA | AS \\ S &\to ASA | aB | a | SA | AS \\ A &\to b | \textcolor{green}{ASA | aB | a | SA | AS} \\ B &\to b \end{aligned}

  1. Now, we just define new variables and rules to make the remaining rules follow CNF rules.

We’ll start by separating the terminals and variables by introducing T_a \to a

\begin{aligned} S_0 &\to ASA | \textcolor{green}{T_a B | T_a} | SA | AS \\ S &\to ASA | \textcolor{green}{T_a B | T_a} | SA | AS \\ A &\to b | ASA | \textcolor{green}{T_a B | T_a} | SA | AS \\ B &\to b \\ \textcolor{green}{T_a} &\textcolor{green}{\to a} \end{aligned}

Now, we’ll work on fixing the last non-CNF rule, ? \to ASA, by introducing a new variable (V_{SA} \to SA)

\begin{aligned} S_0 &\to \textcolor{green}{AV_{SA}} | T_a B | T_a | SA | AS \\ S &\to \textcolor{green}{AV_{SA}} | T_a B | T_a | SA | AS \\ A &\to b | \textcolor{green}{AV_{SA}} | T_a B | T_a | SA | AS \\ B &\to b \\ T_a &\to a \\ \textcolor{green}{$V_{SA}$} & \textcolor{green}{$\to SA$} \end{aligned}