Introduction

Programming Languages

A programming language is a language for computers.

A language has two major components:

Syntax: The structural parts of a language (alphabet, rules, grammar).
Semantics: The meaning of a language (what is the action/effect of this command?)

In this Class: We’ll be focusing entirely on syntax, only dipping barely into semantics.
Semantics is a very deep subject, though.

Problem Domain Driven Languages

One of the driving forces behind the design of programming languages is the class (domain) of problems they want to solve.

Domains that historically driven language design:

Scientific Computation
AI
Enterprise/Business
Management/Systems Programming
Web Application

Example: Important languages made to solve problems

John McCarthy wanted to formalize computer programs and work on AI, leading to LISP. This was at a time when FORTRAN was the programming language to use.
Dennis Ritchie was a systems programmer who made C to program mainframes.
John Backus made FORTRAN (formula translation), the first programming language, at a time when assembly was the programming language to use.
- A modern version if still used in scientific computation.
PERL was made for systems programming.
COBOL was made for business.

Professor’s book recommendation: Out of their Minds

Aside: The total number of programming languages in the world is probably in the thousands due to rarely-used domain-specific languages.

On Purpose:
Domain-specific languages are not general-purpose languages.
Thus, they tend to be simpler and smaller.

Paradigms

Different philosophies have emerged on how to approach the solution of a program.

Some Mainstream Paradigms:

Imperative/Procedural
Function/Declarative/Applicative
Object-Oriented
Logical/Constraint-based

There are plenty of other paradigms too!

What Makes a Good Language?

How do we choose one over the other?

We need a set of criteria that lets us evaluate the merits of a language and its suitability for a particular situation.

Main Criteria: Is it applicable to the problem?

Desirable Features in a Language:

Simplicity: A language should help a programmer think about the program in a clear manner.
- Small feature set, with simple and clear rules.
- aka: there is less to learn about the language
Orthogonal: The way constructs can be combined, and how these combinations are simple to understand and meaningful in the language.
- There should be very few exceptional cases.
- Makes language way faster to learn, goes hand-in-hand with simplicity; but can reduce expressively.
Level of Abstraction: Degree to which the language allows the definition of data abstractions that approximate the constructs in the problem domain.
- Adds expressivity and complexity.
Portability: Degree to which a program can be moved (transported) to another computer.
Cost: The total cost of using a programming language.
- Cost of development, maintenance, execution, etc.
Expressivity: How flexible the language is in providing concise, different ways to define algorithms and computations.
- Often in tension with simplicity and/or orthogonality.

Making a program functional is easy. The difficulty is extensibility, maintainability, etc.

Example: Aspects of some languages

Simplicity:

Java is not simple. There’s interfaces and generics and concurrency and reflection and many, many other features. It takes years to understand Java.
C is very simple, but very hard to get good at because it is so simple.

Orthogonality:

In Java, arithmetic is complicated by mixed-mode expressions (expressions with multiple types). There’s tons of exceptions because the complex memory model requires it.

Abstraction:

FORTRAN was a massive step forward in abstraction.

Portability:

Java is pretty portable if you exclude the cost of porting the JVM.

Cost:

Java is proprietary, and requires a paid license from Oracle (see: Google LLC v. Oracle America, Inc.)

Design Issues

Computer Architecture

The computer architecture affects language design.

Why? — If your language mimics the computation model of the hardware, it’s much easier to translate.
If your model is different, compilation/execution becomes harder (more overhead)

Example: Von Neumann Bottleneck

Every mainstream language is written where instructions are run sequentially, one step after another, because it is complementary to the Von Neumann architecture, which executes instructions sequentially.

In the Von Neumann architecture, data and instructions are stored in main memory accessible over a shared bus, which introduces the Von Neumann Bottleneck.

Modern machines use Modified Von Neumann, where the cache is split into data cache and instruction cache, which accelerates computation dramatically.

Implementation

Three main ways to implement languages:

Compilers: Source code translated to machine code.
Interpreters: Source code parsed and executed in real time, line-by-line.
- Interpreter: Program that can execute every instruction of the target language.
Virtual Machines: Source code translated to byte code, which is interpreted by a virtual machine.
- In-between compilers and interpreters.

More on VMs:
Benefits:
Sandboxing (e.g., Java Applets)
Portability (only compile to byte code, specific CPU architectures are handled by VM)

Syntax

Two parts of syntax:

Lexicon: Symbols, rules for putting them together, and pre-existing words.
Grammar: How you put words together to form valid expressions/statements/blocks/functions/programs.

Lexicon

Alphabet

We define everything very thoroughly.

Don’t forget whitespaces.

We denote alphabet with regular expressions

Reserved Words

The pre-existing words in the language. There are two types:

Keywords: Reserved words that have a purpose (e.g., if, class, int, float, switch)
Reserved: Reserved words that have no meaning

Why?: Why would a language designer reserve a word but give it no function?
Maybe they want to use it in the future.
e.g., Java 17 introduced yield, which was not previously reserved, which broke backwards compatibility.

Case Sensitivity

Case Sensitivity: Whether a language differentiates between upper and lower case.

Example: Languages

Case Insensitive:

SQL
Fortran
Pascal
HTML
LISP

Case Sensitive:

Java
Python
Haskell
C++
Prolog

Professor’s Note: A lot of quirks of modern programming languages are influenced by C and Fortran.
e.g., using = for assignment even though assignment isn’t symmetric, and is better shown by := or <-

Grammar

Rules the language enforces.

Remember: Rules are enforced. Don’t confuse them with convention.

Rules for Words

What are the rules for making words?

Example: Real-World Rules

In Perl, variable names must start with $ or @.

Rules for Literals

What are the rules for making literals?

Literals: Literal values (e.g., 42, true)

Example: Real-World Literals

In Java, you need to append f to define a float.

Lexical Scanner

Syntax and grammar are used to build a lexical scanner.

This is one part of many of a compiler or interpreter
Implemented with Finite State Machines.

Architecture of a Compiler:
Lexical Analysis is done by DFAs, it outputs tokens in-order.
Syntax Analysis is done by PDAs, to build an AST.
Semantic Analysis decorates the tree with semantic information and also builds a symbol table (e.g., type, type safety, etc.)
Translation: The easier part, it does a 1:1 translation to object code
Optimization: The object code is optimized and then spit out in the target form.

Professor’s Note: Knowing how to implement a compiler is a valuable skill worth learning.
“The toughest part is probably building the syntax analysis”