top of page

Bulk Load

Bulk Load Into a Clustered Columnstore

Different from bulk loading data into a rowstore table, you can load bulk amounts of data into a clustered columnstore index by using an INSERT...SELECT ... FROM <TableName> WITH (TABLOCK) statement.

TRUNCATE TABLE Fact.SaleLimited;

INSERT INTO [Fact].[SaleLimited] WITH (TABLOCK) ([City Key], [Customer Key],[Bill To Customer Key],

[Stock Item Key], [Invoice Date Key], [Delivery Date

Key],[Salesperson Key], [WWI Invoice ID], [Description],

[Package], [Quantity]) SELECT TOP (5000)

[City Key], [Customer Key], [Bill To Customer Key],

[Stock Item Key], [Invoice Date Key], [Delivery Date

Key], [Salesperson Key], [WWI Invoice ID],[Description],

[Package], [Quantity] FROM Fact.Sale;

Bulk insert is closely related to Bulk Update (BU)

The BU lock mode is used for bulk copy operations to allow multiple threads to bulk load data into the same table at the same time and to prevent other transactions that are not bulk loading data from accessing the table. SQL Server acquires it when the table lock on bulk load table option is set by using sp_tableoption or when you use a TABLOCK hint like this: INSERT INTO Examples.TestParent WITH (TABLOCK) SELECT <columns> FROM <table>;

 

If you are loading data only to stage it before running more transformations, loading the table to heap table will be much faster than loading the data to a clustered columnstore table.

In addition, loading data to a [temporary table][Temporary] will also load much faster than loading a table to permanent storage. A common pattern for data load is to load the data into a staging table, do some transformation and then load it into the target table using the following command:

INSERT INTO <columnstore index> SELECT <list of columns>

FROM <Staging Table>

This command loads the data into the columnstore index in similar ways to BCP or Bulk Insert but in a single batch.

If the number of rows in the staging table fewer than 102400, the rows are loaded into a delta rowgroup otherwise the rows are directly loaded into compressed rowgroup.

One key limitation was that this INSERT operation was single threaded. To load data in parallel, you could create multiple staging table or issue INSERT/SELECT with non-overlapping ranges of rows from the staging table. This limitation goes away with SQL Server 2016 (13.x). The command below loads the data from staging table in parallel but you will need to specify TABLOCK.

References: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-data-loading-guidance?view=sql-server-2017#plan-bulk-load-sizes-to- minimize-delta-rowgroups

 

Example:

Answer: A


bottom of page