Querying database with non-covering index

When writing a simple query, we often check if there’s an index for all the fields we filter on. The query performs well in both local and production environments, and we don’t think twice about it. However, certain conditions can cause system failure. Let’s explore this further.

The Setup

For this demonstration, I’m using code similar to the one in the Scaling a notification service Series.

We have a table created as follows:

CREATE TABLE [dbo].[Notifications](  
[Id] [bigint] NOT NULL,  
[Payload] [nvarchar](1000) NOT NULL,  
[SendDate] [datetime2](7) NOT NULL,  
PRIMARY KEY CLUSTERED([Id] ASC))

CREATE NONCLUSTERED INDEX [IX_SendDate] ON [dbo].[Notifications]([SendDate] ASC)

We’re using Entity Framework Core to get records where SendDate is between two dates. Note the index IX_SendDate on this column.

Task<Notification[]> GetNotifications(DateTime startDate, DateTime endDate)
{  
    return dbContext.Notifications  
        .Where(x => x.SendDate > startDate && x.SendDate <= endDate)  
        .ToArrayAsync();  
}

The SQL query produced by EF Core is as follows:

exec sp_executesql N'SELECT [n].[Id], [n].[Payload], [n].[SendDate] FROM [Notifications] AS [n] 
WHERE [n].[SendDate] > @__startDate_0 AND [n].[SendDate] <= @__endDate_1',N'@__startDate_0 datetime2(7),@__endDate_1 datetime2(7)',@__startDate_0='2024-03-11 17:22:05',@__endDate_1='2024-03-12 17:22:05'

In my dataset, only a few records will match the query, regardless of the dataset size.

The Issue

The IX_SendDate index is created on the SendDate column and includes the Id column. The EF Core query selects three columns: IdPayload, and SendDate. Our index lacks the Payload column, so SQL Server must perform a Key Lookup and read the data from the disk, which is slow.

Consider these scenarios:

Scenario 1: Local Environment with 1,000 Records

The database has 1,000 records. To execute the query, the database doesn’t use the IX_SendDate index. Instead, it finds it cheaper to:

  • Read each row from the disk
  • Filter the data in memory
  • Select rows that match the query predicate.
Query execution plan when the SQL Server does not use IX_SendDate index

The problem is the first point – reading each row from the disk. In my case, to get around 20 records, the database had to read 1000 records. This is hardly optimal, although it performs ok for now.

Scenario 2: Production Environment with 10,000 Records

The database has 10,000 records. This time, the database used the IX_SendDate index to get the ids of notifications. Then, a Key Lookup is performed to get the whole row. In this case, the database reads the least required data, offering the best performance.

Query execution plan when the database uses IX_SendDate index

Ok, what exactly is the problem?

In both scenarios, EF Core generated the same query with same arguments. The only difference is the number of records in the database.

 The problem is that the query execution plan changes based on the dataset size, and we have no way of knowing when it uses one plan or the other. The database selects one execution plan or the other based on statistics.

To make matters worse, while we’re developing, we usually have a small dataset, and the performance problem is unlikely to show itself.

Real World occurrence

A table in the system I am working on has hundreds of millions of records. During normal operations, SQL Server decided to start using the first query instead of the second, effectively performing table scan every time data was requested.

Why did SQL Server start using a different query execution plan? I’m not sure. There were no changes during that time. Scaling up the Azure SQL Server instance reset some internal cache, and the correct query execution plan was used once again.

To demonstrate the difference in the query execution, the same query was run with and without the index on different dataset sizes:

No. of RecordsQuery No 1, CPU time (ms)Query No 1, logical readsQuery No 2, CPU time (ms)Query No 2, logical reads
1,0000809
10,00024409
100,0005643709
1,000,00096413509
10,000,0009484104309

Execution time of the first query is increasing linearly, while it stays the same for the second one. Note on query 2: depending on where the data is in the index, number of logical reads might wary a bit, but will be in the same ballpark.

In my case, these queries started taking a few seconds each – when you are handling thousands of requests per second, this will bring your system down.

The Solution

The root of the problem is that we need data which is not in the index. Such an index is called a Non-Covering index.

Preferred solution: Split the query into 2 queries

Query 1: Read only the primary key where criteria matches. As noted above, the primary key is always included in the index. In this case, we read only the Id column by filtering on SendDate. The existing IX_SendDate is a covering index for the query, and will perform consistently well.

Query 2: Read the records by using the primary key values returned by query 1. This is a Key Lookup, and while it involves going to the disk, we are reading only the records we want to read. There is no filtering on those values.

Code sample:

async Task<Notification[]> GetNotifications(DateTime startDate, DateTime endDate)  
{  
    // IX_SendDate is covering index for this query, we are getting only the Id, not the whole record.
    var notificationIds = await dbContext.Notifications  
        .Where(x => x.SendDate > startDate && x.SendDate <= endDate)  
        .Select(x => x.Id)  
        .ToArrayAsync();  

    // Get records by Id. WhereIn is a custom extension.
    return await dbContext.Notifications  
        .WhereIn(x => x.Id, notificationIds)  
        .ToArrayAsync();  
}

For 10 million records, the queries produced take 0 milliseconds and perform only 24 logical reads. And most importantly, there is no risk of changing the execution plan in the middle of the night.

You might notice that the 2 queries we have now are exactly the same as what SQL Server does in the scenario #2 described above. That is correct. At the cost of a little IO, we ensure a covering index is always used, and therefore the risk of performing Table Scan is eliminated.

Other solutions:

Solution 1: Make the index a Covering Index

If all required columns are part of the index or included in the index, the query would be served from the index. SQL Server does not need to read data from the disk. This is useful when we need only bits of information. But when we need the whole row, as is the case when working with aggregate roots, it is not a good solution. The size of the index would grow too much.

And if we have multiple indexes, and follow the same approach

Solution 2: Use raw SQL and FORCESEEK to query the database

It is possible to change the original query and add WITH (FORCESEEK) so the database will be forced to use the index + Key Lookup:

SELECT [n].[Id], [n].[Payload], [n].[SendDate]
FROM [Notifications] AS [n] WITH(FORCESEEK)
WHERE [n].[SendDate] > @__startDate_0 AND [n].[SendDate] <= @__endDate_1

This solution will perform better than the preferred solution. There is no back and forth to the server with list of Ids. SQL Server returns only needed rows.

Even if that is the case, I am not a fan of the approach. Writing queries by hand is tedious, prone to errors and the queries have to be changed when the table structure changes. This is exactly the reason why ORMs are invented in the first place.

Links:

FAQ

What is a non-covering index?

  • A non-covering index is an index that does not include all the columns that are needed for a particular query. This means that the database has to perform an additional step, known as a key lookup, to retrieve the necessary data from the disk, which can be slow.
    What is a covering index?
  • A covering index is an index that includes all the columns that are needed for a particular query. This means that the database can retrieve all the necessary data directly from the index without having to perform a key lookup, which can significantly improve query performance.
    Why does the query execution plan change based on the dataset size?
  • The database’s query optimizer chooses the execution plan that it estimates will be the most efficient based on the current statistics about the data. These statistics include information about the distribution and density of the data, the number of rows in the table, and the number of pages that the data occupies on disk. As the data changes, these statistics change, and the query optimizer might choose a different execution plan.
Fast Data Seeding
declare @numberOfRows int = 10000000;  
declare @count int = (select COUNT(0) from Notifications)  

if @count = 0  
    begin  
        insert into Notifications values (1, '{}', '2024-03-11 16:30:00')  
    end  

while @count < @numberOfRows  
    begin  
        declare @max int = (select MAX(Id) from Notifications)  

        insert into Notifications  
        select Id + @max, Payload,  DATEADD(MINUTE, (Id + @max), SendDate) FROM Notifications  

        set @count = (select COUNT(0) from Notifications)  
    end  

delete from Notifications  
where Id > @numberOfRows