So, you may be wondering, what is the main ingredient for the Exadata Database Machine’s stellar performance? Well, that consists of a suite of software known as Exadata Storage Server, which runs on the storage cells and is the primary reason behind Exadata’s superior performance. In this article, we will address each of the storage server’s components.
Storage in the Exadata Database Machine is not just “dumb” storage, for lack of a better term. The storage cells are smart enough to handle a portion of the workload inside them, thus saving database nodes critical resources. This process is called cell offloading. We will talk about this feature in the following section.
In a traditional Oracle database, when a user selects a row or even a single column in a row, the whole block containing that row is retrieved from the disk to the buffer cache, and the selected row is then removed from the block and offered to the user’s session. In the Exadata Database Machine, this process holds true for most types of access, with the exception of a very important few. Direct path accesses – including full table scans and full index scans – are operated in a different way and far more efficiently. Utilizing direct path access, an Exadata Database Machine can access the specific rows from the disks directly and send them to the database nodes. This is known as Smart Scan and results in huge savings.
For instance, your search may only satisfy 1,000 rows out of 1 billion, but full table scans in a traditional database gets all the blocks and filters all the unnecessary rows from them. However, Smart Scan will extract only those 1,000 rows, potentially cutting I/O drastically and greatly enhancing Exadata Database Machine performance over the competitors. The cell offloading plays a huge roll in allowing this to happen.
While not all queries can take advantage of Smart Scan, direct buffer reads can. An example of such a search is a full table scan. An index scan will look into index blocks first and then the table blocks, therefore, Smart Scan is not used.
How can storage cells tell what columns and rows to filter from the data? This is completed by another component built into the Exadata storage software. The communication between nodes and cells offer a specially developed protocol called iDB, which is called Intelligent Database. This protocol can not only request the blocks but also can also send other pertinent information. In those cases where Smart Scan is a possibility, iDB sends the names, table, columns, predicates, and other information on the query. This knowledge allows the cell to learn more about the query, instead of just the address of the blocks to access. Also, using iDB, the cells can send the row and column information instead of the less precise Oracle blocks.
How does Exadata’s Smart Scan only send those relevant rows and columns instead of blocks? A complex data structure built on the pattern of the data within the storage cells allows this to happen. For a precise segment, it stores the minimum, maximum, and whether nulls are there for all the columns of that segment in a specific region of the disk. This data structure is referred to as storage index. When a cell receives a Smart Scan-enabled query from the database node via iDB, it checks which regions of the storage will not contain the data. For example if the search predicate states where rating = 3, a region on the disk where the minimum and maximum values of the column RATING are 4 and 10 respectively will most likely not have any row that will match the predicate. Therefore, the cell stops reading that portion of the disk. Checking the storage index, the cell excludes a lot of regions that will not contain that value.
Despite having the word “index” in its name, a storage index is not like a normal index. Normal indexes are used to zero in on the locations where the rows are going to be found; storage indexes are used for the opposite – where the rows are not going to be found. Also, unlike other segments, they reside in memory and not on disks.
The Exadata database buffer cache is where the data blocks come in just before sending off to the owner. If the data is found there, accessing the storage is not needed. However, if it not found, which could possibly be the case in large databases, the I/O will need to be used. In Exadata Database Machines, a secondary cache can come in between the database buffer cache and the storage, called a Smart Cache. The smart cache holds frequently accessed data and may satisfy the request from the database node from this cache instead of going to the disks – improving performance much in the same way as temporary internet files on your computer make websites load faster that you frequent often.
This is the network inside the Exadata Database Machine – the nervous system through which the different components such as database nodes and storage cells are accessed. Infiniband is a hardware media running a protocol called RDP, or Reliable Datagram Protocol, which has high bandwidth and low latency, thus making the transfer of data very fast.
The Exadata disk layout requires some added explanation because that is where most of the action is. As I mentioned before, the disks are attached to the storage cells and presented as logical units on which physical volumes are built.
Each cell has 12 physical disks. In a high capacity configuration they are 2TB and 600GB in high performance configurations. The disks are used for the database storage. Two of the 12 disks are also used for the home directory and other Linux operating system files. These two disks are divided into different partitions.
The physical disks are separated into several partitions. Each partition is then presented as a Logical Unit (LUN) to the cell. Some LUNs are used to create a file system for the Exadata operating system. The others are presented as storage to the cell. These are called cell disks. The cell disks are divided as grid disks, which reference the grid infrastructure inside. These grid disks are used to build ASM Diskgroups, so they are used as ASM disks. An ASM diskgroup is made up of many ASM disks from several storage cells. If the diskgroup is created with normal or high redundancy, the failure groups are placed in different cells. As a result, the data is still available on other cells if one cell fails.
Two of the 12 disks have the operating system (Oracle Exadata Storage Server software), as well as other operating system related file systems such as /home. For protection purposes, this area is copied as RAID1 on another disk. The filesystems are mounted on that RAID1 volume.
As you can see, this leaves two cell disks of the ten with less data than the others. If we create an ASM disk group on these 12 disks, it will cause an imbalance on the two remaining disks. To combat this, your installer will create another disk group with 29TB from the other 10 cell disks. This will create same sized ASM disks for other diskgroups. This “compensatory” diskgroup is named DBFS_DG. Seeing this diskgroup is built on the inner tracks of the disk, the performance will be low compared to the outer tracks. Therefore, you may want to use it for some other purpose such as ETL files instead of creating a database file here.
In these three article installments, you have learned what Oracle Exadata is, what different hardware and software components make up the Exadata Database Machine, what enables Exadata’s superior performance, and what you should be managing
Please contact Pebble IT for the latest on Oracle Exadata support, Exdata pricing and local Exadata installers