Emulating identity columns in Fabric
Identity columns are present in practically every SQL Server database. But if you are modernizing your on-premises OLAP system to Fabric Warehouse, identity columns are still not supported.
SQL Tran fixes this issue by analyzing your code and writing the equivalent. Of course, it is not as efficient as in SQL Server but we do what is possible using Fabric capabilities until Fabric releases identity columns support.
Let's look at an example:
CREATE TABLE [dbo].[ErrorLog] (
[ErrorLogID] [int] IDENTITY(1, 1) PRIMARY KEY NOT NULL,
[ErrorTime] [datetime] NOT NULL,
[UserName] [sysname] NOT NULL,
[ErrorNumber] [int] NOT NULL,
[ErrorSeverity] [int] NULL,
[ErrorState] [int] NULL,
[ErrorProcedure] [nvarchar](126) NULL,
[ErrorLine] [int] NULL,
[ErrorMessage] [nvarchar](4000) NOT NULL
);
BEGIN
INSERT [dbo].[ErrorLog] (
[UserName], [ErrorNumber], [ErrorSeverity],
[ErrorState], [ErrorProcedure],
[ErrorLine], [ErrorMessage]
)
VALUES
(
'username1', 1, 0, 1, 0, 1, 'message1'
),
(
'username2', 1, 1, 1, 1, 1, 'message2'
);
END
Let's see how does SQL Tran translate it to Fabric:
CREATE TABLE [dbo].[ErrorLog]
(
[ErrorLogID] [int] NOT NULL,
[ErrorTime] DATETIME2(6) NOT NULL,
[UserName] VARCHAR(128) NOT NULL,
[ErrorNumber] [int] NOT NULL,
[ErrorSeverity] [int] NULL,
[ErrorState] [int] NULL,
[ErrorProcedure] VARCHAR(252) NULL,
[ErrorLine] [int] NULL,
[ErrorMessage] VARCHAR(8000) NOT NULL
)
BEGIN
DECLARE @maxIdSqlTran_ErrorLog INT;
IF EXISTS (SELECT *
FROM
[dbo].[ErrorLog])
SET @maxIdSqlTran_ErrorLog = (SELECT MAX(ErrorLogID)
FROM
[dbo].[ErrorLog])
ELSE
SET @maxIdSqlTran_ErrorLog = 0;
INSERT
INTO [dbo].[ErrorLog] ([ErrorLogID], [UserName], [ErrorNumber], [ErrorSeverity], [ErrorState], [ErrorProcedure], [ErrorLine], [ErrorMessage])
VALUES
(@maxIdSqlTran_ErrorLog + 1, 'username1', 1, 0, 1, 0, 1, 'message1'),
(@maxIdSqlTran_ErrorLog + 2, 'username2', 1, 1, 1, 1, 1, 'message2');
END
We have adjusted the data types automatically, flattening out UDTs and multiplying the nvarchar sizes by two. But the real meat is the emulation and the code transformation. We first fetch the maximum value of the identity column and place it in a variable. Then each row in the VALUES section has the monotonically increasing expression inserted at the index of the identity column.
But what about the INSERT/SELECT? Not a problem at all. Here is a short simple example (yes, we can handle basically the most complex queries as well).
Source:
BEGIN
INSERT INTO [Person].[Address]
(
[AddressLine1],
[AddressLine2],
[City],
[PostalCode],
[SpatialLocation],
[rowguid],
[ModifiedDate]
)
SELECT AddressLine1,
AddressLine2, City,
PostalCode,
SpatialLocation,
rowguid,
ModifiedDate
FROM [Person].Address
END
And the transformed code:
BEGIN
DECLARE @maxIdSqlTran_Address INT;
IF EXISTS (SELECT *
FROM
[Person].[Address])
SET @maxIdSqlTran_Address = (SELECT MAX(AddressID)
FROM
[Person].[Address])
ELSE
SET @maxIdSqlTran_Address = 0;
INSERT
INTO [Person].[Address] ([AddressID], [AddressLine1], [AddressLine2], [City], [PostalCode], [SpatialLocation], [rowguid], [ModifiedDate])
SELECT @maxIdSqlTran_Address + ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS AddressID,
AddressLine1,
AddressLine2,
City,
PostalCode,
SpatialLocation,
rowguid,
ModifiedDate
FROM
[Person].Address;
END
The approach is equivalent and exactly what one would write by hand. We could extract the max(id) functionality fetching into a custom function but we felt there is no need. This transformation won't be required forever as we fully expect the Fabric team will add the identity columns support fairly soon.