•  

    SQL Server Physical and logical storage

    Do you know how SQL server physically stores data internally? As a software engineer this knowledge is very important, especially when you want to troubleshoot and fix sql queries that are not performing very well from a performance standpoint. 

    There are few technical jargon terms which are very important to understand, especially when you’re doing something related to SQL server performance tuning and those terms are:

    1. Data Pages

    2. Root Node

    3. Leaf Nodes

    4. B-Tree

    5. Clustered Index Structure

    Now, we all know that data in tables is stored in row and column format at the logical level but physically it stores data as data pages. A data page is the fundamental unit of data storage in an sql server and it is eight kilobytes in size. When we insert any data into the sql server database table, it saves that data to a series of 8 kilobytes data pages. So table data in sql server is actually stored in a tree-like structure. 

     

    Let’s understand this with a simple example. Consider this Employees table.

    Employees

    EmployeeId

    Name

    Enail

    Department

    1

    Mark

    mark@testemail.com

    IT

    2

    John

    john@testemail.com

    HR

    3

    Sara

    sara@testemail.com

    IT

    …..

    ……….

    ….

    …..

    ……….

    ….

    1200

    Steve

    steve@testemail.com

    IT

    In this table, EmployeeId is the primary key column. So by default, a clustered index on this EmployeeId column is created. This means, the data physically stored in the database is sorted by EmployeeId column. 

    Now where the data is actually stored? The answer is, It is stored in a series of data pages in a tree-like structure that looks like the following. 

    This tree-like structure is called B-tree, Index B-tree or clustered index structure. The nodes that you see at the bottom of the tree are called data pages or leaf nodes of the tree and it is these leaf nodes that contain our table data. The size of each data page is 8 kilobytes, this means the number of rows stored in each data page really depends on the size of each row. 

    In our example, let’s say in this Employees table, we have 1200 rows and let’s assume in each data page we have 200 rows but in reality depending on the actual row size, we may have more or less rows, but for this example sake, let’s assume each data page has 200 rows. 

    Remember, the important point to keep in mind is, the rows in these data pages are sorted by EmployeeId column, because EmployeeId is the primary key of our table and hence the clustered key. So in the first data page, we have 1 to 200 rows, in the second 201 to 400, in the third 401 to 600 and so on. The node at the top of the tree is called root node. The nodes between the root node and the leaf nodes are called intermediate levels. There can be multiple intermediate levels. 

    In our example, we have just 1200 rows and to keep this example simple, I included just one intermediate level. 

    But in reality, the number of intermediate levels depends on the number of rows you have in the underlying database table. 

    The root and the intermediate level nodes contain index rows and the leaf nodes contain the actual data rows. Each index row contains a key value.In our case, EmployeeId and a pointer to either an intermediate level page in the B-tree or a data row in the leaf node.

    So the important point is, this tree-like structure has a series of pointers that helps the database engine find the data quickly. 

     

    Let’s say we want to find employee row with EmployeeId = 1120. So the query is very simple. 

    Select * from Employees where EmployeeId = 1120

    So the database engine starts at the root node and from there it picks the index node on the right because the database engine knows it is this node that contains EmployeeIds from 801 to 1200. From there, it picks the leaf node that is present on the extreme right because employee data rows from 1001 to 1200 are present in this leaf node. The data rows in the leaf node are sorted by EmployeeId, so it’s easy for the database engine to find the employee row with Id = 1120. 

     

    Notice, in just three operations sql server is able to find the data we are looking for. So the point is, if there are thousands or millions of records, sql server can easily and quickly find the data we are looking for provided there is an index that can help the query find data. 

    In this specific example, there is a clustered index on the EmployeeId column. So when we search by EmployeeId, sql server can easily and quickly find the data we are looking for. 

    But what if we search by Employee Name? At the moment, there is no index on the Name column. So there is no easy way for sql server to find the data we are looking for. SQL server has to read every record in the table which is extremely inefficient from a performance standpoint. This is when we create a non clustered index on the Name column. 

    Let’s understand how a non-clustered index is stored in the database. 

    In a non-clustered index, we do not have table data. We have key values row locators. 

    We created a non clustered index on the Name column so the key value, in this case employee names are sorted and stored in alphabetical order. The row locators that are present at the bottom of the tree contain employee names and a cluster key of the row. In our example, EmployeeId is the cluster key. 

    Now if we look at one of the row locators, notice the names of the employees are sorted in alphabetical order and we also have their respective employeeIds. So when we search an employee by name, both these indexes, non-clustered index on the Name column and clustered index on the EmployeeId column are going to work together to find the employee we are looking for. 

    Let’s look at the steps involved. 

    First, sql server uses the non-clustered index on the Name column to quickly find this employee entry in the index. In a non-clustered index along with the employee name, we also have the cluster key. The database engine knows there is a clustered index on the EmployeeId so this clustered index is then used to find the respective employee record. 

     

    Reference URL: How is data stored in sql database

0 Years in
Operation
0 Loyal
Clients
0 Successful
Projects

Words from our clients

 

Tell Us About Your Project

We’ve done lot’s of work, Let’s Check some from here