Algorithm Analysis Fundamentals

Types of Analysis

Theoretical vs Empirical Analysis

1. Theoretical Analysis

Time efficiency is analyzed by determining the number of repetitions of the basic operation as a function of input size (n).

Basic Operation: The operation that contributes most towards the running time of the algorithm.

\boxed{ \text{Running Time: } T ( n ) \approx c_{op} C(n) } \\ \small \textit{where $T$ is running time,} \\ \textit{$n$ is input size,} \\ \textit{$c$ is execution time for basic operation, and} \\ \textit{$C$ is \# of times basic operation is executed} \\

In short, we care about:
How long the basic operation takes, and
How many times we do it.

Example: Input size & basic operation of some common problems

Problem	Input Size Measure	Basic Operation
Searching for key in a list of n items	# of list’s items (n)	Key comparison
Multiplying two matrices	Matrix dimensions or total # of elements	Multiplication of two numbers
Checking if a given integer n is prime	n’size = # of digits (in binary representation)	Division
Typical graph problem	# of vertices and/or edges	Visiting a vertex or traversing an edge

2. Empirical Analysis

Select a specific (typical) sample of inputs.
Use a:
1. Physical unit of time (e.g., ms), or
2. Count # of basic operation executions
Analyze empirical data.

Remember: The size and format of input may impact the time efficiency of an algorithm.

Best/Average/Worst Case

Some algorithms’ efficiency depends on form of input.

\boxed{ \text{Worst Case: } C_{\text{worst}}(n) - \text{max over inputs of size $n$} } \\~\\ \boxed{ \text{Best Case: } C_{\text{best}}(n) - \text{min over inputs of size $n$} } \\~\\ \boxed{ \text{Average Case: } C_{\text{avg}}(n) - \text{avg over inputs of size $n$} }

Number of times the basic operation will be executed on typical input.
Expected # of basic operations is considered as a random variable under some assumption about the probability distribution of all possible inputs.

Remember: The average case is NOT the average of the worst and best case!

Example: Worst and Best Case of Sequential Search

Q: Find the worst and best case of this sequential search algorithm.

ALGORITHM SequentialSearch(A[0..n-1],k)
  // Searches for given value in given array
  // Input: Array (A) and search key (k)
  // Output: Index of first element that matches K, or -1 if no matches
  i \leftarrow 0
  while i \lt n and A[i] \ne K do
      i \leftarrow i \text{$+$} 1
  if i \lt n return i
  else return -1

Worst Case: C_\text{worst}(n) = n
- we have to traverse whole array
Best Case: C_\text{best}(n) = 1
- key is the first element

Example: Average Case of Sequential Search

Q: Find the worst and best case of this sequential search algorithm.

ALGORITHM SequentialSearch(A[0..n-1],k)
  // Searches for given value in given array
  // Input: Array (A) and search key (k)
  // Output: Index of first element that matches K, or -1 if no matches
  i \leftarrow 0
  while i \lt n and A[i] \ne K do
      i \leftarrow i \text{$+$} 1
  if i \lt n return i
  else return -1

Standard Assumptions

Let p be the probability of a successful search.

Probabilities are in the range [0,1],
- so, 0 \le p \le 1

For every i, the probability of first match occurring in the ith index is the same.

aka, the key has equal chance to be anywhere

Cases

Case 1

In the case of a successful search, the probability of the first match occurring in the ith position of the list is p/n for every i,

so, expected comparisons for successful search: \sum_{i=1}^{n} \frac{p}{n} \times i
(and, the number of comparisons made by the algorithm in such a situation is i (obviously))

Case 2

In the case of an unsuccessful search, the number of comparisons will be n with the probability of such a search being (1-p)

so, expected comparisons for unsuccessful search: n(1-p)
aka, # of comparisons (n) multiplied by the chance of an unsuccessful search (1-p).

Thus, the total expected comparisons is simply the addition of the above two expectations:

C_{\text{avg}}(n) = \textcolor{green}{ \left[ 1 \times \frac{p}{n} + 2 \times \frac{p}{n} + ... + i \times \frac{p}{n} + ... + n \times \frac{p}{n} \right] } + \textcolor{red}{ n (1-p) } \\~\\ \small\textit{Left term (green): Case 1; Right term (red): Case 2}

Simplifying

Now, we can simplify:

\begin{aligned} C_{\text{avg}}(n) &= [ 1 \times \frac{p}{n} + 2 \times \frac{p}{n} + ... + i \times \frac{p}{n} + ... + n \times \frac{p}{n} ] + n (1-p) \\ &= \frac{p}{n} [ 1 + 2 + ... + i + ... + n ] + n(1-p) \\ &= \frac{p}{n} \frac{n(n+1)}{2} + n(1-p) \\ &= \frac{p(n+1)}{2} + n(1-p) \end{aligned}

Therefore, the average case is:

C_\text{avg}(n) = \frac{p(n+1)}{2} + n(1-p)

Amortized Analysis

Sometimes it’s better to analyze multiple runs rather than a single run.

If the worst case is terrible, but extremely rare, the overall cost might be okay.

Etymology: Amortization is from finance
It’s about spreading the cost of the worst case over the life of the algorithm (useful for like compounding APR)

Asymptotic Analysis

Order of Growth

Asymptotic Analysis: Analysis of order of growth.

Definition: Asymptotic

Asymptotic: The nature of a graph/function as it reaches large values.

e.g., recall asymptotes from calculus, where some functions would have invisible boundaries they would get closer to, but never cross, as they approached infinity

Order of Growth: The dominant behavior of an algorithm as n \to \infin.

Super-duper important!

Example: Normal questions that about order of growth

How much faster will the algorithm run on a computer that’s twice as fast?
How much longer does it take to solve a problem of double input size?

Example: Values of some important functions as $n \to \infin$

n	log_2 n	n	n log_2 n	n^2	n^3	2^n	n!
10^1	3.3	10^1	3.3 \times 10^1	10^2	10^3	10^3	3.6 \times 10^6
10^2	6.6	10^2	6.6 \times 10^2	10^4	10^6	1.3 \times 10^{30}	9.3 \times 10^{157}
10^3	10	10^3	1.0 \times 10^4	10^6	10^9
10^4	13	10^4	1.3 \times 10^5	10^8	10^{12}
10^5	17	10^5	1.7 \times 10^6	10^{10}	10^{15}
10^6	20	10^6	2.0 \times 10^7	10^{12}	10^{18}

Why?: Looking at the asymptotic order of growth lets us compare functions by ignoring constant factors and small input sizes.
e.g., Suppose you want to compare two array sorting algorithms. If all you analyze is how they perform on n = 5, any difference between them is negligible. It’s way more useful to see how each scales at very large n.

Bounds

\text{O}(g(n)): Upper bound (grows no faster than g(n))
\Theta(g(n)): Tight bound (grows at the same rate as g(n))
\Omega(g(n)): Lower bound (grows at least as fast as g(n))