Memory Management

Background

Main memory and registers are the only storage the CPU can directly access.

Thus, programs must be brought from disk to memory.

The memory unit only sees a stream of addresses and read requests, or address + data and write requests.

Some memory protection is needed.

Cycles:

Register access takes one CPU clock (or less)
Main memory can take many cycles, causing a stall.
- Stall: CPU state occurring when the CPU is waiting for data from main memory and must delay execution.

Base and Limit Registers

We define logical address space with pairs of base and limit registers.

Base: Starting address.
Limit: Ending address.

TODO Diagram

CPU must check that every user mode memory access is within the logical address space for that user.

Hardware Address Protection

TODO Diagram of hardware access protection (check address limits)

Checking if Address is Valid:

If Address \ge Base, trap.
If Address \lt Base + Limit, trap.

Logical v. Physical Memory

Physical

Physical Memory (RAM): Hardware structure consisting of a linear sequence of words that hold a program during execution.

Word: Fixed-size unit of data. Typically 1, 2, or 4 bytes.

Physical Address: Integer in range [0, n-1] that identifies a word in a RAM of size n. The address comprises a fixed number of bits.

The starting address of a program in physical memory is unknown until runtime.

Logical

Logical Address Space: Abstraction of physical memory.

Ranges from [0,m-1], where m is size of logical address space.

Logical Address: Integer that identifies a word in logical address space.

Mapped to physical memory prior to execution.

Why?: This simplifies programming, and lets the operating system break up and rearrange (allocate) memory in efficient ways.

More on Program Transformation:
Source: TODO
Object: TODO
Load: TODO
Relocation: Once a program is loaded in memory, we can move it around.
Static Relocation: Bind all logical addresses to physical addresses prior to execution.
Done at compile-time.
Dynamic Relocation: Postpone binding of logical address to physical address until it is accessed during execution.
Done at runtime.
Relocation can be done multiple times, before and during execution.
Static v. Dynamic: There is no room for expanding or shrinking in static relocation. In fact, you can’t even move it, as you’d need to change tons of addresses. Dynamic just means you change the

Implementing Dynamic Relocation using Relocation Registers

TODO Relocation may need to be given a parent section.

Relocation Register: Contains physical starting address of program/program component.

Recall — Most programs are three parts: Code, static data, and dynamic.

Single Relocation Register: The 3 components are a single one unit which occupies a contiguous area of memory.

All we do is add to the starting point to get the physical address.

TODO The other method is the one where we have one relocation register for each component (code, static data, dynamic).

???

Contiguous Allocation

Related to above.

TODO

Early method.

Multiple-Partition Allocation

TODO

Variable-partition sizes allow for efficient memory use.
External Fragmentation: Processes exiting cause holes of various sizes to be scattered throughout.
- Hole: Block of available memory.

External v. Internal Allocation:
External: Happens when we allocate and deallocate memory contiguously, leaving holes.
e.g., you have two 50kb holes that aren’t contiguous, but want to use 75kb.
Internal: ???

???

Hole Algorithms

First-fit: Allocate the first hole that is big enough.

Start the search from wherever, just stop once you find a big-enough hole.

Next-fit: Start each search at point of last allocation

Best-fit: Allocate smallest hole possible.

Worst-fit: Allocate biggest hole possible.

More on Best and Worst — Requires us to search the entire list (unless it’s already sorted)

(Bonus) Coalescing: Relocate things to make holes bigger.

Managing Insufficient Memory

50% Rule: If the probability of finding an exact match of a request approaches 0, one third of all memory partitions are holes and two thirsts are occupied.

This means as a process leaves, another is added, achieving equilibrium and avoiding loss of usable memory due to holes (external memory fragmentation).

Swapping and Memory Compaction

If there is no big-enough hole, we can:

Swap: Temporarily take module out memory and store it the disk.

Compaction: Systematically shift modules (generally to one side) to create a larger hole.

This is super expensive, so this is generally only partially done (just until a hole big-enough is created)

Paging

Page Table: Translates logical to physical addresses.

Keeps track of which page is mapped to which frame.
Lets us give processes non-contiguous physical memory by mapping pages \to frames.

Pages and Frames:

Physical memory divided into frames
- Size is power of 2, between 512b — 16mb.
Logical memory divided into pages.

Remember — Frames and pages must have the exact same size due to 1:1 mapping.
We can have a different number of each, but their sizes must match!

Why? — As previous seen, contiguous allocation requires us to find large empty spaces in the physical memory, which is very hard. And dealing with internal fragmentation is very expensive.

Paging is a cheaper way to manage memory.

Address Translation Scheme with Paging

Address generated by CPU divided into two parts:

Page Number (p): Index into a page table, which contains base address of each page in physical memory.
Page Offset (d): Gets added to base address to get the physical memory address.

Example: ???

For a given logical address space 2^m and page size 2^n:

Page Number: m-n
Offset: d=n

Example: Page Table

Q: An address consists of a 5-bit page number and 4-bit offset.

The number of pages is …
The page size is …
The address (2,3) denotes the binary address …

The number of pages, if you can represent it with 5 bits, is 2^5 = 32
The page size is given to us (4-bits), so it’s just 2^4 = 16
We can say the address is nine bits. To find (2,3), we just have to convert (page, offset) to binary, of matching bit length, which is (00010 0011)

Internal Fragmentation and Bounds Checking

Internal Fragmentation: The loss of usable memory space due to the mismatch between the page size and the size of a program, which creates a hole at the end of the program’s last page.

Example: Calculating internal fragmentation

Page Size: 2048 B

Process Size: 72766 B

35 Pages + 1086 B

Internal fragmentation of 2048 - 1086 = 965 B

Worst-Case Fragmentation = 1 frame - 1 byte

aka: The last frame only uses a single word.

TODO I don’t get this at all.

Since each page can only be allocated to one process (i.e., it can’t be shared) the leftover is wasted (I get this)
Calculating this internal fragmentation, I don’t get. Pawsibly because I don’t get the point of a Page Table.

Internal v. External Fragmentation: Internal fragmentation occurs when you allocate memory with page tables (caused by wasted memory on the last frame), external fragmentation occurs when you allocate memory contiguously.

Example: ???

Q: Consider a logical address space of 2048 pages with a 4KB page size, mapped onto a physical memory of 512 frames.

How many bits are required in the logical address?
How many bits are required in the physical address?

To represent page index, you need 2048=2^{11}, \therefore 11 bits. To represent page size of 4KB, you need 4*1024=2^{13}, \therefore 12 bits. Thus, we need 11+12=23 bits.
Size of physical address is # of frames * frame size. Number of frames is 9 bits ??/, frame size is 12 bits??? , so we need 21???

Example: ???

Q: Assuming a 1KB page size, what are the page numbers and offsets for the following address references (provided as decimal numbers):

3085
42095
650000

All you need to do is divide each process by the page size to get number of pages entirely. Then the rest of the offset. ??? And then it takes up another page.

Page Number + Offset: 3*1024 + 13
Page Number + Offset: 41 * 1024 + 111
Page Number + Offset: 634*1024 + 784

Segmentation

Background — So far we’ve learned memory allocation via (1) contiguous and (2) page table allocation.

The third approach treats programs are a collection of segments.

Rather than breaking the application into same-size blocks, we divide among segments (logical units) and map those.

Segments: We must know the size and starting address of each.
Main Program
Procedure
Function
Method
Object
Local Variables, Global Variables
Common Block
Stack
Symbol Table
Arrays

Note — We have to know the size of the segment because the sizes are variable.

Segment Table

Array that keeps track of which segment resides in which area of physical memory.

Each entry corresponds to one segment and contains the starting address of the segment.

On Segments — Segments are identified by a single number (segment number)
With pure segmentation (no paging), a segment would occupy a contiguous area of physical memory and be the smallest unit of data for memory management.

TODO Copy diagram from slides, “Logical Address Space <–> Segment Table <–> Physical Memory” on “Segmentation”

Address Translation

Similar to paging, segmentation breaks each logical address into two components:

A segment number s, and
An offset w

Example: Segment Table TODO

Base	Limit

Differences from Paging: This is nearly identical to paging, except:

TODO
TODO

^ TODO Basically in a page table all you do is copy patterns, but here we gotta add.

Segmentation with Paging

Now, let’s combine the benefits of both methods.
Segmentation: Ability to create variable-size address spaces.
Paging: Ability to place any page into any frame.

Logical addresses are three components: (s, p, w) (segment #, page #, offset)

Note — This is a space-time tradeoff. We need to use memory to store the segment table, page table, and physical memory.

TODO Copy Diagram 6.4.5 from slides

The segment table now stores size and page table
Again, page table must be same size as frames, but other numbers don’t need to be equal.

TODO isn’t this like a 2D matrix? like s[p[w]]?

Translation Lookaside Buffer

TLB: Fast associative memory buffer that maintains recent translations of logical addesses to frames.

Principle of Locality: Locations accessed recently are more likely to be reused.

In Other Words — It’s a frame & page table cache.

Example: Translation Lookaside Buffer

Suppose we’ve previous mapped logical (s,p,w) \to physical (f,w) and have it cached.

If we then want to get (s,p,w'), all we have to do is calculate (f, w')

Effective Access Time

\boxed{ \text{EAT} = \text{TLB hit time} * \text{hit ratio} + \text{TLB miss time} * (1 - \text{hit ratio) } TODO Cleanup this formula?

Hit Ratio: Percentage of times a page number is found in the associative registers.

The ratio is related to the number of associated registers.

Example: EAT

Q: Calculate EAT for hit ratio of 99% with 100ns memory access.

\text{EAT} = 100 \times .99 + 200 \times .01 \\ = 101 \text{ns}

???

GAP (missed beginning of class) (TODO)

Demand Paging

Demand Paging: Principle of loading a page into memory only when it’s needed.

Present Bit: Flag that indicates whether a page is residing in memory.

Page Fault: TODO

What Happens if There is No Free Frame?

As virtural memory is much larger than physical memory… TODO

Page Replacement: TODO

Modified Bit: Flag that indicates TODO indicates if a page has been changed.

Paging of Tables

Recall how we needed three tables loaded in memory to do this.

Idea: To reduce overhead, a table can be divided into individual pages and brought into memory on demand (via demand paging and page replacement).

TODO BASICALLY: Page the page table, because the page table is so large.

This changes access from s,p,w to s,p1,p2,w TODO cleanup

Page Replacement with Fixed Numbers of Frames

TODO this is about like limiting # of frames a process can have?

Reference String: Sequence of page numbers referenced by an executing program during a time interval. TODO what. its like used to compaire page replacement algorithms.

Page Replacement Algorithms

FIFO

Simplest to implement. Selects the page that has been resident in memory the longest.

Takes advantage of the principle of locality.

TODO Example of FIFO page replaceement (copy from slides)

This had 4 page faults.

Belady’s Anomaly

Belady’s Anomaly: Adding more frames doesn’t always reduce page faults!

Why?: Algorithms that are not stack-based (e.g., FIFO, others) suffer from Belady’s Anomaly.

Optimal

Selects page that won’t be referenced for the longest time in the future.

Difficult to implement, requires future knowledge.
Achieves least page faults.

Usually used as a comparison.

TODO Example of Optimal page replacement (copy from slides)

Least Recently Used

Selects page that has not been referenced for the longest time.

Uses past knowledge.
Generally good algorithm, frequently used.

Three Implementations:

Queue: Sorted from most recently used page to least-recently used page with the least recently referenced page q at the head.
Counter: Simplest case, associate each page-table entry with a time-of-use field and add to the CPU a logical clock/counter. Whenever a reference to a page is made, the contents of the clock register are copied to the time-of-use field (replacing the page with the smallest time-of-use).
Stack: When a page is referenced, remove from stack and put on top. Requires double-linked list, because you need a head and tail pointer.

Approximation of LRU

Like optimal, LRU doesn’t suffer from Belady’s anomaly.

True LRU requires queue/stack/clock, which makes algorithm too inefficient.

Aging Page replacement
Second-Chance (not stack!)
Third-Chance

Aside — Stack Algorithms: Algorithm for which it can be shown that the set of pages in memory for n frames is always a subset of the set of pages that would be in memory with n + 1 frames.
In other words, when you make a new call, all the old ones are still a part of the deal (???)
For LRU, the set of pages in memory would be the n most recently referenced pages. If the number of frames is increased, these n pages will still be the most recently referenced and so will still be in memory (ties into Second Chance).

Aging Page Replacement

Rather than maintain pages sorted in exact LRU order, pages are grouped (given a referenced bit), and the aging register is shift periodically to the right by 1 bit.

Algorithm is invoked TODO

Something to do with discarding the least-significant bit (this is all very efficient on the hardware). We are working at the level of bit shifts, but the simple pattern/behavior is efficient… TODO

tl;dr we roughly do LRU by not caring about exact order?

Second-Chance

Coarse-grain approximation of LRU.

r-bit divides all pages into two categories:
1. Recently Referenced
2. Not Recently Referenced
At page fault, algorithm repeats the following steps until a victim is selected.
- If page is at page p with r=0, then select p and advance pointer to next page.
- Otherwise, reset r to 0, advance the pointer to the next page, and TODO

Third-Chance

Like second-chance, but iwht an additional m-bit to detect modifications.

Basically, replacing a modified page is more costly than replacing an unmodified page, so we give them an additional chance to remain resident.
There are four categories due to the two bits.

r-bit and m-bit	becomes	means
11	01	modified
01	00	modified (cont.)
10	00	unmodified

Behavior: TODO basically, we want to replace a 00 victim,

aka: Not-recently-used page replacement algorithm.

TODO PA 7.3.6 Diagram

Page Replacement with Variable Number of Frames

Recall — If page fault becomes too high, efficiency drops (waiting for pages to move from disk to memory).

Optimal Working Set: Set of resident pages that will still be needed in the immediate future, and thus should remain.

Varies with program’s behavior.
TODO describe window size a bit here

Why? — In the first few memory operations, the number of page references increases rapidly, but the rate diminishes with time.
Thus, d must be chosen to cover the period of rapid growth, but not past the point where adding more pages is wasteful.

TODO example of the window trawling through the reference string, explaining at each time unit, the size of things referenced and working set.

Working Set Page Replacement Algorithm

To estimate the optimal working set, we use temporal locality by assuming recently references pages will be similar to pages references in the immediate future

TODO

Time and Space Efficiency of Virtual Memory

Page Fault Rate: Number of page faults (f) occuring during an number of memory references (t).

\boxed{ P = f/t \text{ where } 0 \le P \le 1 }

P=1 means every memory reference causes a pae fault, P=0 means no page faults occur.

Effective Access Time:

\boxed{ E = (1-P) * m + P * S } \\~\\ \text{where m s the time to access physical memory and S is the time to process a page fault}

Example: TODO PA : 7.5.2

Load Control and Thrashing

Load Control: Activity of determining how many processes should be running concurrently to maximize performance.

Thrashing: When most time is spent moving pages between memory and disk while CPU remains mostly idle, causing processes to not make any progress.

TODO So basically, we do a lot of context switching, and more processes means smaller pages, therefore more page faults.

The Choice of a Page Size

Why? — Page size has a major impact on time and space overhead of virtual memory (and thus the entire system).

Larger: You can put less pages in memory, but can give more memory to each process.

But, if you don’t use them to their max, you lose a lot to internal fragmentation.

Smaller: You can put more pages in memmory, but you can get more page faults.

Most common page sizes range between 512 and 4096, with the number increasing with improvements in disk technology.

To accomodate diff. programs, some systems support multiple page sizes.