I just began to create a file system abstraction layer for jCoreDB. This is the lowest layer of an Database System. It is used to interact with the Operating Systems File system. Often DBMs also have raw file system access implemented. This means that not the OS file system is used, but the bytes are directly written to a not yet formatted partition. Other abstractions are possible as well. So such an abstracted FS could even use a web service based cloud store in the background. However, jCoreDB will currently abstract only the tradional file system which is provided by the underlying OS.
Here some terminilogy:
What has a file system to abstract:
I will soon provide first lines of source code regarding the File System implementation. The following thoughts should be taken into account:
Distribution is a kind of related to concurrency, but we will more focus on thougts regarding data distribution. A file system contains multiple containers. A container may be bound to a path. If you put container #I to a path which belongs to disk #1 and container #II to disk #2 then you also achived a simple distribution of data. So the idea is to use a container as a partition. Currently we are not interested in why data is stored inside the container, this will be covered layers above. Another distribution approach could be a more service based one. Today everything is available in the cloud ;-) , so it would be also possible (but not with the same performance) to build a web service on top of of file system. Then it could give a file system load balancer and registry. If a file system starts up, then it registers with the registry and so it is taken into account by the load balancer. Load balancer rules are used to determine which block should be written to which file system. In a distrubuted mode file system #1 has only the containers #1, #2 whereby file system #2 has the container #3, #4 ... and so on. So each file system has two partitions. In a fail over mode each write request will be forwarded to every registered file system. The read requests could be scheduled by using the real load information or at first by using just Round Robin.
I will soon publish some first (not yet tested) source code. After this code is tested and evaluated, the next layer will be the Page Buffer one. So I am looking really forward to write the next blog post about page buffering and scheduling. Another important topic will be indexing and storage structures.
Here some terminilogy:
- A container contains one or more segments
- A segment is a file with a preallocated number of blocks. We differ between data segments and header segments. A header segment belongs to a data segment and contains the information about the size of the segment, block size inside the segment and a free memory bitmap.
- A block has a fixed number of bytes
- A block id is a tuple which contains the Container id, the Segment id and the position within a segment
What has a file system to abstract:
- Create, open and delete containers
- Write a specific block
- Read a specific block
- Append a specific block
- Delete a specific block
I will soon provide first lines of source code regarding the File System implementation. The following thoughts should be taken into account:
- File System Concurrency
- Distributed File Systems
Distribution is a kind of related to concurrency, but we will more focus on thougts regarding data distribution. A file system contains multiple containers. A container may be bound to a path. If you put container #I to a path which belongs to disk #1 and container #II to disk #2 then you also achived a simple distribution of data. So the idea is to use a container as a partition. Currently we are not interested in why data is stored inside the container, this will be covered layers above. Another distribution approach could be a more service based one. Today everything is available in the cloud ;-) , so it would be also possible (but not with the same performance) to build a web service on top of of file system. Then it could give a file system load balancer and registry. If a file system starts up, then it registers with the registry and so it is taken into account by the load balancer. Load balancer rules are used to determine which block should be written to which file system. In a distrubuted mode file system #1 has only the containers #1, #2 whereby file system #2 has the container #3, #4 ... and so on. So each file system has two partitions. In a fail over mode each write request will be forwarded to every registered file system. The read requests could be scheduled by using the real load information or at first by using just Round Robin.
I will soon publish some first (not yet tested) source code. After this code is tested and evaluated, the next layer will be the Page Buffer one. So I am looking really forward to write the next blog post about page buffering and scheduling. Another important topic will be indexing and storage structures.