Emulating identity columns in Fabric

Emulating identity columns in Fabric

Identity columns are present in practically every SQL Server database. But if you are modernizing your on-premises OLAP system to Fabric Warehouse, identity columns are still not supported.

SQL Tran fixes this issue by analyzing your code and writing the equivalent. Of course, it is not as efficient as in SQL Server but we do what is possible using Fabric capabilities until Fabric releases identity columns support.

Let's look at an example:

CREATE TABLE [dbo].[ErrorLog] (
  [ErrorLogID] [int] IDENTITY(1, 1) PRIMARY KEY NOT NULL, 
  [ErrorTime] [datetime] NOT NULL, 
  [UserName] [sysname] NOT NULL, 
  [ErrorNumber] [int] NOT NULL, 
  [ErrorSeverity] [int] NULL, 
  [ErrorState] [int] NULL, 
  [ErrorProcedure] [nvarchar](126) NULL, 
  [ErrorLine] [int] NULL, 
  [ErrorMessage] [nvarchar](4000) NOT NULL
);

BEGIN
  INSERT [dbo].[ErrorLog] (
    [UserName], [ErrorNumber], [ErrorSeverity], 
    [ErrorState], [ErrorProcedure], 
    [ErrorLine], [ErrorMessage]
  ) 
  VALUES 
    (
      'username1', 1, 0, 1, 0, 1, 'message1'
    ), 
    (
      'username2', 1, 1, 1, 1, 1, 'message2'
    );
END

Let's see how does SQL Tran translate it to Fabric:

CREATE TABLE [dbo].[ErrorLog]
(
    [ErrorLogID] [int] NOT NULL,
    [ErrorTime] DATETIME2(6) NOT NULL,
    [UserName] VARCHAR(128) NOT NULL,
    [ErrorNumber] [int] NOT NULL,
    [ErrorSeverity] [int] NULL,
    [ErrorState] [int] NULL,
    [ErrorProcedure] VARCHAR(252) NULL,
    [ErrorLine] [int] NULL,
    [ErrorMessage] VARCHAR(8000) NOT NULL
)
BEGIN  
    DECLARE @maxIdSqlTran_ErrorLog INT;
    IF EXISTS (SELECT *
    FROM
    [dbo].[ErrorLog])
    SET @maxIdSqlTran_ErrorLog = (SELECT MAX(ErrorLogID)
    FROM
        [dbo].[ErrorLog])
    ELSE
      SET @maxIdSqlTran_ErrorLog = 0;
    INSERT
    INTO [dbo].[ErrorLog] ([ErrorLogID], [UserName], [ErrorNumber], [ErrorSeverity], [ErrorState], [ErrorProcedure], [ErrorLine], [ErrorMessage])
    VALUES
        (@maxIdSqlTran_ErrorLog + 1, 'username1', 1, 0, 1, 0, 1, 'message1'),
        (@maxIdSqlTran_ErrorLog + 2, 'username2', 1, 1, 1, 1, 1, 'message2');
END

We have adjusted the data types automatically, flattening out UDTs and multiplying the nvarchar sizes by two. But the real meat is the emulation and the code transformation. We first fetch the maximum value of the identity column and place it in a variable. Then each row in the VALUES section has the monotonically increasing expression inserted at the index of the identity column.

But what about the INSERT/SELECT? Not a problem at all. Here is a short simple example (yes, we can handle basically the most complex queries as well).

Source:

 BEGIN
     INSERT INTO [Person].[Address]
     (
         [AddressLine1],
         [AddressLine2],
         [City],
         [PostalCode],
         [SpatialLocation],
         [rowguid],
         [ModifiedDate]
     )
     SELECT AddressLine1,
         AddressLine2, City,
         PostalCode,
         SpatialLocation,
         rowguid,
         ModifiedDate
     FROM [Person].Address
 END 

And the transformed code:

BEGIN
   DECLARE @maxIdSqlTran_Address INT;
IF EXISTS (SELECT *
FROM
  [Person].[Address])
  SET @maxIdSqlTran_Address = (SELECT MAX(AddressID)
  FROM
      [Person].[Address])
ELSE
  SET @maxIdSqlTran_Address = 0;
INSERT
  INTO [Person].[Address] ([AddressID], [AddressLine1], [AddressLine2], [City], [PostalCode], [SpatialLocation], [rowguid], [ModifiedDate])
  SELECT @maxIdSqlTran_Address + ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS AddressID,
      AddressLine1,
      AddressLine2,
      City,
      PostalCode,
      SpatialLocation,
      rowguid,
      ModifiedDate
  FROM
      [Person].Address;
END

The approach is equivalent and exactly what one would write by hand. We could extract the max(id) functionality fetching into a custom function but we felt there is no need. This transformation won't be required forever as we fully expect the Fabric team will add the identity columns support fairly soon.