Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Understanding Cache Memory: Organization, Functioning, and Optimization - Prof. Alan L. Su, Study notes of Computer Science

University of Maryland Computer Science

Prof. Alan L. Sussman

An in-depth exploration of cache memory, its organization, functioning, and optimization techniques. It covers topics such as cache blocks, cache hits and misses, cache replacement policies, and various cache optimization strategies. The document also discusses the differences between main memory and cache memory, and the impact of cache misses on machine performance.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-75o 🇺🇸

10 documents

1 / 24

Partial preview of the text

Download Understanding Cache Memory: Organization, Functioning, and Optimization - Prof. Alan L. Su and more Study notes Computer Science in PDF only on Docsity! CMSC 411 - A. Sussman (from D. O'Leary) 1 Computer Systems Architecture CMSC 411 Unit 5 – Memory Hierarchy Alan Sussman April 15, 2003 CMSC 411 - Alan Sussman 2 Administrivia • Quiz next Tuesday, April 22 • Homework 4b due Thursday • Project out by tomorrow – due May 9 • Read Chapter 5 – concentrate on 5.1-5.12 • Midterm questions? CMSC 411 - Alan Sussman 3 Last time • Global code scheduling – to move instructions across branches, and preserve data and control dependences, and exception behavior – trace scheduling – selection and compaction – superblocks – like trace, but only 1 entry point • Hardware support – predicated instructions – convert control into data dependences – help with exception handling for global code motion – address conflict resolution for reordering loads/stores • speculative loads Cache Memory CMSC 411 - Alan Sussman 5 Issues to consider • How big should the fastest memory (cache memory) be? • How do we decide what to put in cache memory? • If the cache is full, how do we decide what to remove? • How do we find something in cache? • How do we handle writes? CMSC 411 - Alan Sussman 6 First, there is main memory • Jargon: – frame address – which page? – block number – which cache block? – contents – the data CMSC 411 - A. Sussman (from D. O'Leary) 2 CMSC 411 - Alan Sussman 7 Then add a cache • Jargon: Each address of a memory location is partitioned into – block address • tag • index – block offset Fig. 5.5 CMSC 411 - Alan Sussman 8 How does cache memory work? • The following slides discuss: – what cache memory is – three organizations for cache memory • direct mapped. • set associative • fully associative – how the bookkeeping is done • Important note: All addresses shown are in octal. Addresses in the book are usually decimal. CMSC 411 - Alan Sussman 9 What is cache memory? Main memory first Main memory is divided into (cache) blocks. Each block contains many words (16-64 common now). CMSC 411 - Alan Sussman 10 Main memory Blocks are grouped into frames (pages), 3 frames in this picture. CMSC 411 - Alan Sussman 11 Main memory (cont.) 0 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 2 0 2 1 2 2 Blocks are addressed by their frame number, and their block number within the frame. CMSC 411 - Alan Sussman 12 Cache memory Cache has many, MANY fewer blocks than main memory, each with a block number, a memory address, data, a valid bit, a dirty bit. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 10 21 42 53 74 25 16 77 CMSC 411 - A. Sussman (from D. O'Leary) 5 CMSC 411 - Alan Sussman 25 Managing cache (cont.) 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 10 21 42 53 14 25 16 77 If all other memory references involved block 14, no other blocks would need to be fetched from memory. But suppose eventually need to fetch blocks 10, 31 and 66. Need to fetch all three, because don’t have valid versions of them. Valid Dirty Computer Systems Architecture CMSC 411 Unit 5 – Memory Hierarchy Alan Sussman April 17, 2003 CMSC 411 - Alan Sussman 27 Administrivia • Quiz Tuesday, April 22 • Homework 4b due today • HW 5 out soon • Project out - due May 9 – questions? CMSC 411 - Alan Sussman 28 Last time • Cache memory – Partition an address into a block address (tag and index) and a block offset – Cache holds blocks, located via the tag, with a valid and dirty bit for each block – Direct mapped cache has exactly one cache location for each block – Set associative cache has a set of possible locations for a block (N for N-way set associative – index part of address determines which set) • replacement choices: random, LRU, FIFO – Fully associative cache allows a block to go into any cache location CMSC 411 - Alan Sussman 29 Managing cache 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 10 21 42 53 14 25 16 77 Use direct mapped cache as an example. After first read operation, cache memory looked like this. Valid Dirty CMSC 411 - Alan Sussman 30 Managing cache (cont.) 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 10 21 42 53 14 25 16 77 If all other memory references involved block 14, no other blocks would need to be fetched from memory. But suppose eventually need to fetch blocks 10, 31 and 66. Need to fetch all three, because don’t have valid versions of them. Valid Dirty CMSC 411 - A. Sussman (from D. O'Leary) 6 CMSC 411 - Alan Sussman 31 Managing cache (cont.) 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 10 31 42 53 14 25 66 77 The result looks like this. Now suppose write to block 66. Valid Dirty CMSC 411 - Alan Sussman 32 Managing cache (cont.) 1 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 1 2 3 4 5 6 7 10 31 42 53 14 25 66 77 The block is valid in cache, so don’t need to fetch. But the write operation sets the dirty bit for that block. Valid Dirty CMSC 411 - Alan Sussman 33 Managing cache (cont.) 1 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 1 2 3 4 5 6 7 10 31 42 53 14 25 66 77 The write operation sets the dirty bit for that block. That means that the cached block is different from the memory block, so must eventually be written back. Valid Dirty CMSC 411 - Alan Sussman 34 Managing cache (cont.) 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 2 3 4 5 6 7 10 42 53 14 25 66 77Now suppose need to read from a block not in cache. If it is block 41, then must overwrite block 31. Valid Dirty 1 0 1 31 CMSC 411 - Alan Sussman 35 Write-through cache 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 2 3 4 5 6 7 10 42 53 14 25 66 77 In write-through caches, every write causes an immediate change both to cache and to main memory. So the read just involves fetching the block. Valid Dirty 1 0 1 31 CMSC 411 - Alan Sussman 36 Write-back cache 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 2 3 4 5 6 7 10 42 53 14 25 66 77 In write-back caches, every write causes a change only to cache. So the read involves writing block 31 back to memory if its dirty bit is set, then fetching block 41. Valid Dirty 1 0 1 41 CMSC 411 - A. Sussman (from D. O'Leary) 7 CMSC 411 - Alan Sussman 37 Reads easy, writes are not • Most memory access is read, not write, because read both data and instructions but write only data • If the data requested is not in cache, call that a cache miss • It’s easy to make most reads from cache fast: just pull the data into a register as soon as it is accessed, while checking whether the address matches the tag. If not, that is a cache miss, so load a block from main memory to cache. • Can’t do this with write: – must verify the address before changing the value of the cache location CMSC 411 - Alan Sussman 38 Write through vs. write back • Which is better? – Write back gives faster writes, since don't have to wait for main memory – Write back is very efficient if want to modify many bytes in a given block – But write back can slow down some reads, since a cache miss might cause a write back – In multiprocessors, write through might be the only correct solution. Why? CMSC 411 - Alan Sussman 39 Cache summary • Cache memory can be organized as direct mapped, set associative, or fully associative • Can be write-through or write-back • Extra bits such as valid and dirty bits help keep track of the status of the cache CMSC 411 - Alan Sussman 40 How much do memory stalls slow down a machine? • Suppose that on pipelined MIPS, each instruction takes, on average, 2 clock cycles, not counting cache faults/misses • Suppose, on average, there are 1.33 memory references per instruction, memory access time is 50 cycles, and the miss rate is 2% • Then each instruction takes, on average: 2 + (0 × .98) + (1.33 × .02 × 50) = 3.33 clock cycles CMSC 411 - Alan Sussman 41 Memory stalls (cont.) • To reduce the impact of cache misses, can reduce any of three parameters: – main memory access time (miss penalty) – miss rate – cache access (hit) time CMSC 411 - Alan Sussman 42 Reducing cache miss penalty • 5 strategies: – Give priority to read misses over write misses – Don't wait for the whole block – Use a nonblocking cache – Multi-level cache – Victim caches • First 4 used in most desktop and server machines CMSC 411 - A. Sussman (from D. O'Leary) 10 CMSC 411 - Alan Sussman 55 Reducing the miss rate • Sometimes cache misses are inevitable: – The first time a block is used, need to bring it into cache (a compulsory miss) – If need to use more blocks at once than can fit into cache, some will bounce in and out (capacity miss) – In direct mapped or set associative caches, there are certain combinations of addresses that cannot be in cache at the same time (conflict miss) CMSC 411 - Alan Sussman 56 Miss rate – Fig. 5.15 SPEC2000, LRU replacement CMSC 411 - Alan Sussman 57 How to reduce the miss rate? • Use larger blocks • Use more associativity, to reduce conflict misses • Victim cache • Pseudo-associative caches • Prefetch (hardware controlled) • Prefetch (compiler controlled) • Compiler optimizations CMSC 411 - Alan Sussman 58 Increasing block size • Want the block size large so don’t have to stop so often to load blocks • Want the block size small so that blocks load quickly Fig. 5.16 – SPEC92 CMSC 411 - Alan Sussman 59 Increasing block size (cont.) • So large block size reduces miss rates, but... • Example: – Suppose that loading a block takes 80 cycles (overhead) plus 2 clock cycles for each 16 bytes – A block of size 64 bytes can be loaded in 80 + 2*64/16 cycles = 88 cycles (miss penalty) – If the miss rate is 7%, then the average memory access time is 1 + .07 * 88 = 7.16 cycles CMSC 411 - Alan Sussman 60 Memory Access Times – Fig. 5.18 1.5492.2884.68511.651112256 1.4701.9793.6598.46996128 1.4491.9333.3237.1608864 1.5882.1343.4117.0828432 1.8942.6734.2318.0278216 256K64K16K4KMiss penalty Block size Cache size CMSC 411 - A. Sussman (from D. O'Leary) 11 CMSC 411 - Alan Sussman 61 Higher associativity • A direct-mapped cache of size N has about the same miss rate as a 2-way set-associative cache of size N/2 – 2:1 cache rule of thumb (seems to work up to 128KB caches) • But associative cache is slower than direct-mapped, so the clock may need to run slower • Example: – Suppose that the clock for 2-way memory needs to run at a factor of 1.1 times the clock for 1-way memory • the hit time increases with higher associativity – Then the average memory access time for 2-way is 1.10 + miss rate × 50 (assuming that the miss penalty is 50) CMSC 411 - Alan Sussman 62 Memory access time – Fig. 5.19 1.661.591.551.20512 1.821.741.661.32256 2.001.921.841.52128 2.252.182.141.9264 2.452.372.302.0632 2.532.462.402.2316 2.622.552.582.698 3.283.223.253.444 Eight-wayFour-wayTwo-wayOne-wayCache size (KB) Associativity CMSC 411 - Alan Sussman 63 Pseudo-associative cache • Uses the technique of chaining, with a series of cache locations to check if the block is not found in the first location – e.g., invert most significant bit of index part of address (as if it were a set associative cache) • The idea: – Check the direct mapped address – Until the block is found or the chain of addresses ends, check the next alternate address – If the block has not been found, bring it in from memory • Three different delays generated, depending on which step succeeds Computer Systems Architecture CMSC 411 Unit 5 – Memory Hierarchy Alan Sussman April 24, 2003 CMSC 411 - Alan Sussman 65 Administrivia • Quiz questions? • HW 5 • Project questions? CMSC 411 - Alan Sussman 66 Hardware prefetch • Idea: If read page k of a book, the next page read is most likely page k+1 • So, when a block is read from memory, read the next block too – maybe into a separate buffer that is accessed on a cache miss before going to memory • Advantage: – if use blocks sequentially, will need to fetch only half as often from memory • Disadvantages: – more information to move – may fill the cache with useless blocks – may compete with demand misses for memory bandwidth CMSC 411 - A. Sussman (from D. O'Leary) 12 CMSC 411 - Alan Sussman 67 Compiler-controlled prefetch • Idea: The compiler has a better idea than the hardware does of when blocks are being use sequentially • Want the prefetch to be nonblocking: – don't slow the pipeline waiting for it • Usually want the prefetch to fail quietly: – if ask for an illegal block (one that generates a page fault or protection exception), don't generate an exception; just continue as if the fetch wasn't requested – called a non-binding cache prefetch Compiler optimizations to reduce cache miss rate CMSC 411 - Alan Sussman 69 Four compiler techniques • 4 techniques to improve cache locality: – merging arrays – loop interchange – loop fusion – blocking CMSC 411 - Alan Sussman 70 Technique 1: merging arrays • Suppose have two arrays: int val[size]; int key[size]; • and that usually use both of them together CMSC 411 - Alan Sussman 71 Merging arrays (cont.) This is how they would be stored if cache blocksize is 64 words: val[0] val[1] val[2] val[3] . . . val[64] val[65] val[66] val[67] . . . . . val[size-1] key[0] key[1] key[2] key[3] . . CMSC 411 - Alan Sussman 72 Merging arrays (cont.) Means that at least 2 blocks must be in cache to begin using the arrays. val[0] val[1] val[2] val[3] . . . val[64] val[65] val[66] val[67] . . . . . val[size-1] key[0] key[1] key[2] key[3] . . CMSC 411 - A. Sussman (from D. O'Leary) 15 CMSC 411 - Alan Sussman 85 Blocking access to arrays (cont.) Example: Matrix-matrix multiplication = CMSC 411 - Alan Sussman 86 Blocking access to arrays (cont.) Trouble: Easy to get rows of A; not so efficient to get columns of B. = A B = C CMSC 411 - Alan Sussman 87 Blocking access to arrays (cont.) And if cycle through rows of A, end up loading all of B m times, where m is the number of rows of A. Computing the elements in the columns of C: = A B = C CMSC 411 - Alan Sussman 88 Blocking access to arrays (cont.) = A B = C CMSC 411 - Alan Sussman 89 Blocking access to arrays (cont.) = A B = C CMSC 411 - Alan Sussman 90 Blocking access to arrays (cont.) = A B = C CMSC 411 - A. Sussman (from D. O'Leary) 16 CMSC 411 - Alan Sussman 91 Blocking access to arrays (cont.) = A B = C CMSC 411 - Alan Sussman 92 Blocking access to arrays (cont.) = A B = C CMSC 411 - Alan Sussman 93 Blocking access to arrays (cont.) Instead, order the computation using rectangular blocks of A and B. = A B = C Partial answer! CMSC 411 - Alan Sussman 94 Blocking access to arrays (cont.) If the block of A has k rows, then only need to load B m/k times. = A B = C Partial answer! CMSC 411 - Alan Sussman 95 Blocking access to arrays (cont.) Improves temporal locality /* Before */ for (i=0; i<N; i++) for (j=0; j<N; j++) { r=0; for (k=0; k<N; k++) r=r+y[i][k]*z[k][j]; x[i][j]=r; } /* After */ for (jj=0; jj<N; jj=jj+B) for (kk=0; kk<N; kk=kk+B) for (i=0; i<N; i++) for (j=jj; j<min(jj+B,N); j++) { r=0; for (k=kk; k<min(kk+B,N); k++) r= r+y[i][k]*z[k][j]; x[i][j]=x[i][j]+r; } CMSC 411 - Alan Sussman 96 Reducing the time for cache hits • K.I.S.S. • Use virtual addresses rather than physical addresses in the cache. • Pipeline cache accesses • Trace caches CMSC 411 - A. Sussman (from D. O'Leary) 17 CMSC 411 - Alan Sussman 97 K.I.S.S. • Cache should be small enough to fit on the processor chip • Direct mapped is faster than associative, especially on read – overlap tag check with transmitting data • For current processors, small L1 caches to keep fast clock cycle time, hide L1 misses with dynamic scheduling, and use L2 caches to avoid main memory accesses CMSC 411 - Alan Sussman 98 Simulated cache access times Fig. 5.24 – 0.8-micron feature size, 1 R/W port, 32 address & 64 data bits, 32-byte blocks CMSC 411 - Alan Sussman 99 Use virtual addresses • Each user has his/her own address space, and no addresses outside that space can be accessed • To keep address length small, each user addresses by offsets relative to some physical address in memory (pages) • For example: 1005500 125412 005400 Virtual addressPhysical address CMSC 411 - Alan Sussman 100 Virtual addresses (cont.) • Since instructions use virtual addresses, use them for index and tag in cache, to save the time of translating to physical address space (the subject of the next part of this unit) • Note that it is important to flush the cache and set all blocks invalid when switch to a new user in the OS (a context switch), since the same virtual address then may refer to a different physical address – or use the process/user ID as part of the tag in cache • Aliases are another problem – when two different virtual addresses map to the same physical address – can get 2 copies in cache • what happens when one copy is modified? CMSC 411 - Alan Sussman 101 Pipelined cache access • Latency to first level cache is more than one cycle – we’ve already seen this in Units 3 & 4 • Benefit is fast cycle time • Penalty is slower hits – also more clock cycles between a load and the use of the data (maybe more pipeline stalls) CMSC 411 - Alan Sussman 102 Trace cache • Find a dynamic sequence of instructions to load into a cache block, including taken branches – instead of statically, from how the instructions are laid out in memory – branch prediction needed for loading cache • One penalty is complicated address mapping, since addresses not always aligned to cache block size – can also end up storing same instructions multiple times • Benefit is only caching instructions that will actually be used (if branch prediction is right), not all instructions that happen to be in the same cache block CMSC 411 - A. Sussman (from D. O'Leary) 20 CMSC 411 - Alan Sussman 115 Interleaved memory (cont.) • Note how nice interleaving is for write- through • Also helps speed read and write-back • Note: Interleaved memory acts like wide memory, except that words are transmitted through the bus sequentially, not in parallel CMSC 411 - Alan Sussman 116 Independent memory banks • Each bank of memory has its own address lines and (usually) a bus • Can have several independent banks: perhaps – one for instructions – one for data • Banks can operate independently without slowing others CMSC 411 - Alan Sussman 117 Avoid memory bank conflicts • By having a prime number of memory banks • Since arrays frequently have even dimension sizes - and often dimension sizes that are a power of 2 - strides that match the number of banks (or a multiple) give very slow access CMSC 411 - Alan Sussman 118 Example • First access the first column of x: – x[0][0], x[1][0], x[2][0], ... x[255][0, • with addresses – K, K+512*4, K+512*8, ... K+512*something • With 4 memory banks, all of the elements live in the same memory bank, so the CPU will stall in the worst possible way int x[256][512]; for (j=0; j<512; j=j+1) for (i=0; i<256; i=i+1) x[i][j] = 2 * x[i][j]; CMSC 411 - Alan Sussman 119 Number theory to the rescue! • Subtitle: One reason why computer scientists need math • Fact 1: It is easy to compute mod, if the base is a prime number that is one less than a power of 2 • Fact 2: The Chinese remainder theorem says that it is safe to do a rather bizarre, but convenient, mapping of words to memory banks • Idea: Suppose have 3 memory banks with 8 words each – Map word k to memory bank k mod 3, word k mod 8 – For example, word 17 (decimal) goes to bank 2, address 1 – Word 9 goes to bank 0, word 1 CMSC 411 - Alan Sussman 120 Number theory (cont.) • That mapping gives the following arrangement of words: 237152322217 142262019186 513211716155 204121413124 11193111093 210188762 17195431 81602100 210210 Modulo interleavedSequentially interleaved Memory bankAddress within bank CMSC 411 - A. Sussman (from D. O'Leary) 21 CMSC 411 - Alan Sussman 121 How much good do these techniques do? • Example: Assume a cache block of 4 words, and – 4 cycles to send address to main memory – 24 cycles to access a word, once the address arrives – 4 cycles to send a word back to cache • Basic miss penalty: 4*32 = 128 cycles, since each of 4 words has the full 32 cycle penalty • Memory with a 2-word width: 2*32 = 64 cycle miss penalty • Simple interleaved memory: address can be sent to each bank simultaneously, so miss penalty is 4 + 24 + 4*4 (for sending words) = 44 cycles • Independent memory banks: 32 cycle miss penalty, as long as the words are in different banks, since each has its own address lines and bus CMSC 411 - Alan Sussman 122 A confession: We've been lying • The lie: User's don't really use physical addresses in their programs • Instead, they use virtual addresses, where virtual means just what it does in virtual reality! • This idea is over 40 years old, invented by the designers of the Atlas computer (Section 5.18) – Cache memory is just a little newer, discussed in print in 1965 • So when the user addresses word 450, the computer provides the illusion that the data is actually in this address, when really the data is somewhere else • This address translation or memory mapping is invisible to the user CMSC 411 - Alan Sussman 123 Why virtual addressing • Computers are designed so that multiple users can be active at the same time • At the time a program is compiled, the compiler has to assign addresses to each data item. But how can it know what memory addresses are being used by other users? • Instead, the compiler assigns virtual addresses, and expects the loader to provide the means to map these into physical addresses CMSC 411 - Alan Sussman 124 In the olden days … • The loader would locate an unused set of main memory addresses and load the program and data there • There would be a special register called the relocation register, and all addresses that the program used would be interpreted as addresses relative to the base address in that register • So if the program jumped to location 54, the jump would really be to 54 + contents of relocation register. A similar thing, perhaps with a second register, would happen for data references CMSC 411 - Alan Sussman 125 In the less-olden days ... • It became difficult to find a contiguous segment of memory big enough to hold program and data, so the program was divided into pages, with each page stored contiguously, but different pages in any available spot, either in main memory or on disk • This is the virtual addressing scheme – to the user, memory looks like a contiguous segment, but actually, data is scattered in main memory and perhaps on disk CMSC 411 - Alan Sussman 126 But we know all about this! • Already know that a program and data can be scattered between cache memory and main memory • Now add the reality that its location in main memory is also determined in a scattered way, and some pages may also be located on disk • So each page has its own relocation value CMSC 411 - A. Sussman (from D. O'Leary) 22 CMSC 411 - Alan Sussman 127 Virtual Memory – Fig. 5.31 CMSC 411 - Alan Sussman 128 Parameters – Fig. 5.32 32-64 bit virtual address to 25-45 bit physical address 25-45 bit physical address to 14-20 bit cache address Address mapping 0.00001-0.001%0.1-10%Miss rate (.2-2*106 cc)(2-20 cc)(transfer time) (.8-8*106 cc)(6-130 cc)(access time) 106-107 cc8-150 ccMiss penalty 50-150 cc1-3 clock cyclesHit time 4096-65,536 bytes16-128 bytesBlock (page) size Virtual memoryFirst-level cacheParameter CMSC 411 - Alan Sussman 129 Cache vs. virtual memory Fundamental unit is a fixed- length page or a variable- length segment Fundamental unit is a block page faultcache fault Virtual memory size fixed for a particular program Cache size fixed for a particular machine Page faults handled by operating system Cache miss handled by hardware Virtual memoryCache CMSC 411 - Alan Sussman 130 Paging vs. segmentation – Fig. 5.34 Not always (small segment problem) Yes (can adjust page size)Efficient disk traffic External fragmentation (in unused memory) Internal fragmentation (within page) Memory use inefficiency Hard (must find contiguous, variable-sized chunk) Trivial (all blocks same size) Replacing a block May be visible to app programmer Invisible to app programmer Programmer visible? Two (segment/offset)OneWords per address SegmentPage CMSC 411 - Alan Sussman 131 Managing main memory • Fully-associative mapping, because page faults are really, really expensive • Page is located using a page table, one entry per page in the virtual address space – Size is sometimes reduced by hashing, to make one entry per physical page in main memory – an inverted page table • Since locality says that a page will be used multiple times, address translation usually tests the address of the last- referenced page before looking in other places • So address translation information is held in the translation look-aside buffer (TLB) Computer Systems Architecture CMSC 411 Unit 5 – Memory Hierarchy Alan Sussman May 1, 2003

Documents

questions

Understanding Cache Memory: Organization, Functioning, and Optimization - Prof. Alan L. Su, Study notes of Computer Science

Related documents

Partial preview of the text