Tuesday, September 25, 2012

The page buffer

In this post I would like to talk a little bit about page buffering. We remember that the previous posts had subjects like File System Abstration where the focus was on reading and writing blocks from or to the disk. So what is a page? A page is just a block of bytes, so it is quite similar to the block which was explained previously. But it's not only a new name for the same thing, but it differs regarding it's use case. A block is something which is addressed physically on disc and which accessed in an abstract way by the DBMS, whereby a page is a block of bytes which is kept inside the main memory. So a page is managed within the page buffer. For this purpose it has some additional information attached

So the purpose of the buffer is to keep pages available within the main memory. The main memory access is a lot faster than the hard disc access. A buffer can be cold or hot. A hot buffer contains almost every typically used page. So the target is to reach a cache hit ratio which is more than 90%. The cache hit ratio is the ratio between cache hits and cache misses. A cache hit means that a required page was available in the buffer. The opposite is a cache miss, which means that the page was gathered from disc.

You may ask why we explicitely implement a buffer by not using the operating system one. The answer is that the characteristics of an OS buffer are not everytime sufficient. Not every OS buffer knows e.g. of the pin-operation. This operation allows to tell the buffer that a specific page should not moved out of the buffer. Let's assume a nested loop join. Which means two nested loops those are accessing pages. A column-value of a row of a page within the buffer should be compared with other values of other pages. It's easy to see that we need to keep the outer page inside the buffer whereby comparing with the other ones.

A buffer has a maximum size. So it can only keep a specified number of pages. If the buffer is full, which means that every buffer frame is used, it then is required to replace an existing page with the requested one. So strongly related to the buffer is the scheduler. A scheduler decides which page should become replaced. Schedulers can be very simple or more complex. Here some possible scheduling algorithms:
  • F(irst) (I)n F(irst) O(ut): Replaces just the oldest page
  • L(east) F(requently) U(sed): Replaces the page which was less often referenced 
  • D(ynamic) G(erneralized) CLOCK: Goes clockwise through the frames by counting the local references and by having also one global counter. If the global counter reaches a limit, then a normalization happens.
jCoreDB does at first implemented LFU. Afterwards we will implement DGCLOCK. LFU has the disadvantage that it only takes the number of references into account but not the age of the page within the buffer. So the most often used pages are kept inside the buffer. We can easily see that this is not fair enough for just added pages those have also a higher probability to be used the next time. So we also should take the age into account to get a better scheduler. DGCLOCK does this by combining a reference counter with an age counter.

Besides the scheduling strategy, a buffer also realizes a write strategy. The following strategies are available:
  • Write Through: Just write every change (Update, Insert) from the buffer directly to the underlying file system.
  • Write Behind: Perform the change at first in the buffer. If a page has another state within the buffer than on disk it is marked as 'dirty'. One or multiple writers are used to afterwards write the dirty pages to the underlying file system.
jCoreDB does at first implement Write Through, just because the implementation effort is less than for Write Behind. Later we will implement Write Behind. Write Behind is necessary to realize a multi threaded buffer. It will also become important related to transaction management.

Currently the following classes were implemented in order to realize the Page Buffer requirements:
  • IPageBuffer: Defines a wrapper for the file system functions by providing additional buffering
  • IScheduler: Defines which operations a scheduler should support
  • Page: A page extends a block. A page has a state within the buffer (pinned, dirty,...) and a  type (data, index, overflow)
  • SchedulerFactory: Returns a scheduler implementation based on the configuration
  • LFUScheduler: The scheduler which realizes the LFU algorithm
  • PageIndex: A hash index which maps the block id to the buffer frame index
  • WriteThroughPageBuffer: The buffer which performs a Write Through