Entity Framework - inserting large number of rows
From time to time we have to insert a large number of rows into the database, usually because of data import. There are many ways to do it, but the simplest one is to take DbContext, add into the collection, and call SaveChanges. When we use Entity Framework 6 or less, there are a few important rules to follow:
Example code for this approach:
Batch method source:
In the next entry, I will check if these rules apply to Entity Framework Core too.
- Setting context.Configuration.AutoDetectChangesEnabled to false. By default Entity Framework uses change tracker to find all changes made to entities, to generate SQL scripts later. But all we do is make inserts so we know there are no changes, only new rows. There is no need to compare with existing items so change tracker can be disabled.
- Splitting inserts into batches. If you want to insert 10000 rows into the database, it is better to split them into small batches and call context.SaveChanges() after every n rows where n should be determined experimentally. 100 isn't bad value, but it depends on the size of data.
- Recreating DbContext after every batch. Even if SaveChanges is called after every batch, it is a good idea to create a new context after. If change tracking is enabled, items are still being stored in context so a collection of items grows. When we create a new context, the cost of inserting the entity into the collection is just smaller. Memory usage will also be smaller.
- Using DbSet.AddRange (documentation). New in EF 6. This is another approach to change detection. Instead of adding items one by one, we are adding the collection. Change detection is called only once.
- Inserting data in multiple threads. This one depends on the performance of our hard drive and won't scale exactly with the number of processors, but can give a large performance increase if done properly.
- Disabling context.Configuration.ValidateOnSaveEnabled. This is optional because it disables built-in validation. But if we are sure our entities are correct, it can be disabled to spare some processor cycles.
Example code for this approach:
Batch method source:
In the next entry, I will check if these rules apply to Entity Framework Core too.
Comments
Post a Comment