🚀 Exploring SQL Server Query Optimization: A Tale of EmpFirst_Name in India 🚀

Are you ready to dive into the fascinating world of SQL Server query optimization? Let's talk about the RID Lookup Operator, a key player in the SQL Server execution plan! 

The RID Lookup Operator comes into play when we query a table that doesn't have a clustered index, unlike the commonly used primary key. Instead, SQL Server uses the Row ID (RID) to uniquely identify rows in such tables.

Here's the deal: RID Lookup can be a performance challenge. Each RID Lookup operation translates into a separate I/O operation, which can slow down your query execution. So, it's essential to keep an eye on it, especially in larger datasets.

Let us first create the below table and fill it with 3K records to use it in the below examples . The table can be created and filled with data using the T-SQL script below:


CREATE TABLE EmployeeData (

ID INT IDENTITY(1, 1)

,EmployeeFirstName VARCHAR(50)

,EmployeeLastName VARCHAR(50)

,EmployeeAddress VARCHAR(MAX)

,EmployeePhoneNumber VARCHAR(50)

)

GO

INSERT INTO EmployeeData

VALUES (

'John'

,'Doe'

,'123 Main St'

,'555-123-4567'

) GO 1000

INSERT INTO EmployeeData

VALUES (

'Jane'

,'Smith'

,'456 Elm St'

,'555-987-6543'

) GO 1000

INSERT INTO EmployeeData

VALUES (

'Robert'

,'Johnson'

,'789 Oak St'

,'555-789-1234'

) GO 1000

Assume that we have a Non-Clustered index on the EmpFirst_Name column of 

the EmployeeData table, that is created using the CREATE INDEX T-SQL statement below:

CREATE INDEX IX_EmployeeData_P2_Employee_First_Name ON EmployeeData(EmployeeFirstName);


If you try to run the below SELECT statement to retrieve information about all employees with a specific EmpFirst_Name values, after including the Actual SQL Server execution plan of that query:

             SELECT * FROM EmployeeData WHERE EmployeeFirstName = 'Tony';


Checking the SQL Server explain plan generated after executing the query, you will see that the SQL Server Query Optimizer will use the Non-Clustered index to seek for all the employees with the EmployeeFirstName values equal to 'Tony', without the need to scan the overall table for these values. On the other hand, the SQL Server Engine will not be able to retrieve all the requested values from that Non-Clustered index, as the query requests all columns for that filtered records. Recall that the Non-Clustered index contains only the key column values and a pointer to the rest of the columns for that key in the base table.

Due to the fact that this table contains no Clustered index, the table will be considered a heap table that has no criteria to sort its pages and sort the data within the pages.Because of this, the SQL Server Engine will use the pointers from the Non-Clustered index, that points to the location of the rest of the columns on the base table, to locate and retrieve the rest of columns from the underlying table, using a Nested Loops operator to join the Index Seek data with the set of data retrieved from the RID Lookup operator, also known as a Row Identifier operator, as shown in the SQL Server execution plan below:



A RID is a row locator that includes information about the location of that record such as the database file, the page, the slot numbers that helps to identify the location of the row quickly. If you move the mouse to point to the RID Lookup on the generated SQL Server execution plan to view the tooltip of that operator, you will see in the Output List, the list of columns that are requested by the query and returned by this operator, as these columns are not located in the Non-Clustered index, as shown below:

Execution plan


If you look at the RID Lookup operator in the SQL Server execution plan, you will see that the cost of that operator is high related to the overall weight of the plan, which is 50% in our example. This is due to the additional I/O overhead of the two different operations that are performed instead of a single one, before combining it with a Nested Loops operation. This overhead can be neglected when processing small number of rows. But if you are working with huge number of records, it is better to tune that query, rewrite the query by limiting the retrieved columns or create a covering index for that query. If the RID Lookup is eliminated by creating a covering index, the Nested Loops operator would not be needed in this SQL Server execution plan.

How can you optimize your queries and minimize RID Lookups?

1️⃣ Clustered Index: Consider adding a clustered index to the table. It can significantly reduce RID Lookups by organizing data efficiently.

2️⃣ Covering Index: Create non-clustered indexes that cover the columns you frequently query. This can help SQL Server avoid RID Lookups by retrieving all necessary data directly from the index.

3️⃣ Query Tuning: Review your queries. Make sure you're only selecting the columns you need and applying appropriate filtering conditions.
















Comments