Introduction to IA-32e hardware paging

In this article, we explore the complexities and concepts behind Intel's 64-bit paging scheme, why we need paging in the first place, and some practical analysis of paging structures.

Why do we need paging?

In any application, whether it's a student's first program or a complicated operating system, instructions executed by the computer that involve memory use a virtual address. In fact, even when the CPU fetches the next instruction to execute, it uses a virtual address. A virtual address represents a specific location in the application's view of memory; however, it does not represent a location within physical RAM. Paging, or linear address translation, is the mechanism that converts a linear address accessible by the CPU to a physical address that the memory management unit (MMU) can use to access physical memory.

Technically, a linear address and a virtual address are not the same. For the purposes of this article, though, we will consider them to be the same, since we do not need to consider segmentation. Older architectures would first need to convert a virtual address to a linear address using segmentation.

Figure 1: An application with different parts of virtual memory mapping to different parts of physical memory.

Paging modes

In this article, we will focus on IA-32e 4-level paging (64-bit paging) on Intel architectures. It is worth noting, though, that there are other paging modes supported by Intel.

There are three mechanisms which control paging and the currently enabled paging mode. The first is the PG flag (bit 31) in control register 0 (CR0). If this bit is set, paging is enabled on the processor. If this bit is not set, no paging is enabled. In the latter case, the virtual address and physical address are considered equivalent and no translation is necessary.

If paging is enabled on the processor, then control register 4 (CR4) is checked for the Physical Address Extension (PAE) bit (bit 5) being set. If it is not, then 32-bit paging is used. If it is set, then the final condition that is checked is the Extended Feature Enable Register, or IA32_EFER MSR. If the Long Mode Enable (LME) bit (bit 8) of this register is not set, the processor is in PAE 36-bit paging mode. If the LME bit is set, the processor is in 4-level paging mode, which is the 64-bit mode that we plan to explore in this article. This mode translates 48-bit virtual addresses into 52-bit physical addresses, though because the virtual addresses are limited to 48-bits, the maximum addressable space is limited to 256TB.

Paging structures

Regardless of which paging mode is enabled, a series of paging structures are used to facilitate the translation from a virtual address to a physical address. The format and depth of these paging structures will depend on the paging mode chosen. Generally speaking, each entry in the paging structure is the size of a pointer and contains a series of control bits, as well as a page frame number.

In our case, 64-bit mode structures are 4,096 bytes in size (the size of the smallest architecture page - we will touch more on that later), containing 512 entries each. Every entry is 8 bytes.

Figure 2: A paging structure containing 512 pointer-size entries in 64-bit mode.

The first paging structure is always located at the physical address specified in control register 3 (CR3). As an aside, this is also the only place that stores the fully qualified physical address to a paging structure - in all other cases, we need to multiply a page frame number by the size of a page to get the real physical address. Each entry within the paging structure will contain a page frame number which either references a child paging structure for that region of memory, or maps directly to a page of physical memory that the original virtual address translates to. Again, in both cases, the page frame number is simply an index of a physical page in memory, and needs to be multiplied by the size of a page to get a meaningful physical address. Each paging structure entry also describes the the different memory access protections that are applied to the memory region they describe - whether the code is writable, executable, etc - as well as some more interesting properties such as whether or not that specific structure has previously been used for a translation.

While the nested paging structures are being walked, the translation can be considered complete either by identifying a page frame at the lowest level of paging structure or by an early termination caused by the configuration of a paging structure. For example, if a paging structure is marked as not present (bit 0 of the structure is not set) or if a reserved bit is set, the translation fails and the virtual address is considered invalid. Additionally, a paging structure can set its Page Size bit to indicate that it is the lowest paging structure for that region of memory, which we will touch more on later.

Figure 3: Some paging structures may not map to a physical page because the virtual address range they represent is invalid.

Anatomy of a virtual address

Information is encoded in a virtual address that makes the translation to a physical address possible. In 64-bit mode, we use 4-level paging, which means that any given virtual address can be divided into 6 sections with 4 of them associated with the different paging structures.

The different paging structures are as follows: a PML4 table (located in CR3), a Page Directory Pointer Table (PDPT), a Page Directory (PD), and a Page Table (PT). The figure below illustrates which bits of a given virtual address map to these different paging structures.

A single entry in the PML4 table (a PML4E) can address up to 512GB of memory, while an entry in the PDPT (a PDPTE) can address 1GB (parent granularity divided by 512) of memory, and so on. This is how we get the granularity of the paging structures down to 4KB at the lowest level.

Figure 4: The anatomy of a virtual address in 64-bit mode.

In the example above, we see that the highest bits (bits 63-48) are reserved. We will talk more about these bits in a future article, but for the purposes of address translation they are not used.

The next 9 bits (bits 47-39) are used to identify the index into the PML4 table that contains the entry (PML4E) that's next in our paging structure walk. For example, if these 9 bits evaluate to the number 16, then the 16th entry in the table (PML4[15]) is selected to be used for the address translation.

Once we have the PML4E entry from the given index, we can use that entry to provide us the address of the start of the next paging structure to walk to. Here is an example of what a PML4E structure would look like in C++.

Using the page frame number (PFN) member of the structure (in this case, it actually refers to the page frame where the next structure is located), we can now walk to the next structure in the hierarchy by multiplying that number by the size of a page (0x1000). The result of that multiplication is the physical address where the next paging structure is located. The PML4E points to a Page Directory Pointer Table (PDPT). We use the next 9 bits of our original virtual address (bits 38-30) to determine the index in the PDPT that we want to look at. At that index, we will find a PDPTE structure, like the one defined below.

It's worth noting at this point that paging structures other than those in the PML4 table contain a Page Size (PS) bit (bit 7). If this bit is set, then the current entry represents the physical page. This means that page sizes as large as 1GB can be supported, if the associated PDPTE indicates that it is a 1GB page by setting the PS bit. Otherwise, 2MB pages can be supported if the PS bit is set in the PDE structure. Not all processors support the PS bit being set in a PDPTE; therefore, not all processors will support 1GB pages.

Moving along in our example, we can assume that the PS bit is not set in the PDPTE that we just referenced. So, we will look at the page frame number of this structure and multiply by the page size again to get the physical address of the next paging structure root.

Figure 5: Our walk so far, from the CR3 register, through the PML4 and PDPT structures.

Using the PFN stored in the PDPTE structure, we're able to locate the Page Directory paging structure, which is next in the hierarchy. As before, we use the next 9 bits (bits 29-21) of the original virtual address to get the index into this structure where our entry of interest (a PDE, in this case) resides. The PDE structure is defined similarly to the previous structures, as shown below.

Again, we can use the PFN member of this structure multiplied by the size of a page to locate the next, and final, paging structure that facilitates the translation - the Page Table. The next 9 bits (bits 20-12) of our original virtual address are the index into the Page Table where the associated entry (PTE) is located. This PTE structure is defined below, and once again has similar characteristics to its predecessors.

The PFN member of this structure indicates the real page frame of the backing physical memory. Because our example went the full depth of the paging structures, the size of a page frame is 4KB, or 0x1000. Thus, in order to get the location in physical memory where the backing page begins, we multiply the page frame number from the PTE by 0x1000 as we had been doing previously. The remaining 12 bits (bits 11-0) of the original virtual address are the offset into the physical page where the actual data resides. Had our example not used the full depth of the paging structures, and had instead used 2MB page sizes (stopping at the Page Directory level), that PDE would have contained the page frame number of interest, and we would have multiplied that number by the size of a page frame, which in that case would be 2MB or 0x20000. We would then add the offset into the page, which would be the remaining bits (bits 20-0) of the original virtual address since we did not need to use the usual 9 bits for indexing into a Page Table structure.

Figure 6: Here we have a full traversal of the paging structures from CR3 all the way to the final PTE. We use the PFN from the PTE to calculate the backing physical page.

Practical exploration with WinDbg

We can use WinDbg to explore what this structure hierarchy looks like in practice. Windows does some things differently (such as per-process CR3 to keep the virtual address spaces of processes separate) and there are certain complexities that we will cover in a future article, but we will choose a simple example that demonstrates what we've just learned. 

Attach an instance of WinDbg as a kernel debugger to the virtual machine or physical box of your choice to get started. Check out this article for instructions on how to do so.

Once we've broken in, use the lm command to list the modules that have been loaded by the current process.

We'll use the image base of ntdll.dll as our example. It's located at 0x00000000`771d0000. We can view the memory at that virtual address by using db (or dX, where X is your desired format specifier).

Here we can see the signature 'MZ' as we would expect from a DOS header. But where are these bytes located in physical memory? There are two ways we can find out.

The first way is the hard way - we can get the value stored in CR3 which gives us the beginning of our PML4 paging structure, and begin our manual walk like we described above.

This means that the start of our PML4 table is located at physical address 0x187000. We can take a look at the physical memory at that location using !dq (or !dX, again where X is the format specifier you want to use). We're aligning on a quad-word because the size of each entry in any paging structure in 64-bit mode is 8 bytes.

Here we see that we have one PML4E structure, with 0x00700007`ddc82867 as the value. For a paging structure entry, we know that bits 47-12 represent the page frame number of the next paging structure. So we extract those bits to get 0x7ddc82, then multiply it by the size of a page frame on this architecture (4KB) to get a physical address of 0x00000007`ddc82000.

If we navigate to that physical address, let's see what we get.

Sure enough, there are two PDPTE entries (or potentially more, off-screen, since there can be up to 512 listed) here in this PDPT that we've walked to. In order to figure out which PDPTE we need to reference, we'd need to refer to the 9 bits in the original virtual address that map to the PDPT (bits 39-30), which in the case of our example works out to 0x1. That means we want the second entry of the PDPT structure, at index 1.

We can extract the page frame number from that PDPTE entry using the same bits we used in the last example (bits 47-12), resulting in 0x7d96b8. Let's multiply that number by 4KB, and see what we've got at that physical address.

You may be wondering at this point: what's going on? Why is there nothing in the PD structure that was referenced by our PDPTE? Remember, not all memory is valid and mapped, so the fact that we are seeing a bunch of zero-value PDE entries isn't a surprise. It just means that those regions of virtual memory aren't currently mapped to a physical page. In order to get to the PDE we care about, we need to take the next 9 bits of the original virtual address as we did before, this time getting a value of 0x1b8 after extracting the bits. That will get us the index into the PD structure where our PDE of interest is located. We can navigate to that memory location now, remembering to multiply the index by the size of a paging structure entry, which is 8 bytes.

That gets us 0x67e00007`d96b9867 as our PDE value. Once again, we extract the bits that are relevant to the page frame number, and we come up with 0x7d96b9.

We can repeat the steps we've taken previously to multiply that page frame number by 4KB, add the PT index using the next 9 bits of the original virtual address (0x1d0 in this case), then navigate to the correct physical address.

We've gotten the value 0xe7d00007`d9cc0025 for our PTE entry. We're almost there! We just need to do the same steps we've been doing one more time - extract the PFN from that value (0x7d9cc0), multiply by the size of a page (0x1000), but this time, we need to add the page offset (bits 11-0) from our original virtual address to the result. This should get us to 0x00000007`d9cc0000 since our page offset in this example was actually zero. Let's look at the memory!

There's the header, just like we expected. That's a cumbersome amount of work, though, and we don't want to have to be doing that manually every time we try to translate an address. Luckily, there's an easier way.

WinDbg provides the !pte command to illustrate the entire walk down the paging structures and what each entry contains. It is important to note, though, that the addresses of the paging structures are converted to virtual addresses before being displayed, so they will look different from the physical addresses we extrapolated on our own, but they point to the same memory.

You can see that WinDbg gives us the address of the paging structure used, what it contained, and the page frame number for each. You can verify that the PFN on the PXE (PML4E) entry matches up with what we calculated, too. The most important part of all of this information is the PFN that's within the lowermost entry, the PTE. In our case it's 0x7d9cc0.

So, we can multiply that page frame number by 0x1000 to get 0x00000007`d9cc0000, and that should be the physical address of the DOS header of ntdll.dll! This checks out based on the manual calculations we did previously, but let's take a look again to make sure.

And there it is! We can test this by editing the DOS header in WinDbg and seeing if those changes are reflected on the physical page.

Let's check it out using the virtual address...

...and the physical address...

And there you have it! We now know how to successfully walk the IA-32e paging structures to convert a virtual address into a physical address.


Popular posts from this blog

Breaking backwards compatibility: a 5 year old bug deep within Windows

Exploring Windows virtual memory management