Sync tables in SQL databases using Row Version
For SQL Server database professionals a common task is synchronizing data between two tables. There are different ways of implementing a solution in SQL Server such as Change Data Capture or Change Data Tracking, replication, using triggers and so on. However, there are some situations when the above solutions are not applicable. For example source and destination tables are located in database systems from different vendors (for example Sybase and SQL Server).
Suppose, we have two tables in two different databases and the data in these tables needs to be fully synchronized. We can compare the data in these tables, update existing rows and add new rows. However, comparing all of the data in the tables is not an effective way and it would be better if we can detect changes in the source table and only apply these changes to the target. This problem can be solved by using the SQL Server rowversion data type.
In this article we'll look at the rowversion data type and see how it can be used in data synchronization.
The values for a rowversion data type are automatically generated binary numbers which take 8 bytes of storage and are unique within a database. A non-nullable rowversion column is semantically equivalent to a binary(8) column and a nullable rowversion column is semantically equivalent to a varbinary(8) column. Columns with a rowversion data type are generally used for detecting changes in the row. You can have only one rowversion column in a table. When data is inserted or updated in a table that has a rowversion column, the database counter called the database rowversion is incremented. This incremented value is set as the value for the rowversion column for updated or inserted rows.
SQL Server rowversion Overview
Let's create a sample database:
We can get the database's current rowversion value with this query:
The result is the following:
Here are a few things to note. A timestamp data type is a synonym for rowversion. When we create a table with a column using a rowversion data type using T-SQL and then look at that column in SQL Server Management Studio (SSMS), we can see that the data type of column is timestamp:
If we try to add a column with a rowversion data type in SSMS, we will notice that there is no rowversion in the data type list as shown below. The only choice is timestamp.
It is possible to create a table with a timestamp column without mentioning the column name and it will be automatically generated:
However, if we try this with rowversion we get an error:
Microsoft recommends using rowversion instead of timestamp in DDL statements, because the timestamp will be deprecated in future versions.
Example Using SQL Server rowversion
Let’s use the above created TestTableA to demonstrate the rowversion data type behavior. We will insert data into TestTableA table and we will see the values in the LastChange column:
As we can see, the values have automatically been generated and the last one is the same as the databases current rowversion:
Now when we update the row, we can see that the value in the "LastChange" column for that row has changed:
It is also important to note that after an update where the values have not been changed, the rowversion does still change:
Detecting changes and updating target table
Now it's is time to explore data synchronization using the rowversion data type using a simple example. Let’s assume that we have two tables - source and target - data on the target table needs to be periodically synchronized with data on the source table (it is assumed that the data in source table does not get deleted):
We can see that "LastChange" column in the source table is created as a rowversion data type and the corresponding column in the target table is created as a binary(8) data type. The following code inserts data into the source table:
Now we will synchronize data in the target and source tables. In order to not compare all of the data in the source and target tables, we will insert only changed data after the last synchronization from the source to the target table. To identify these changes, we will use the data in the "LastChange" column in the following way: first, we will find the maximum value in the "LastChange" column in the target table. After that we will choose only the rows from the source table where the value in the "LastChange" column is greater than the maximum value of the "LastChange" column in the target table and finally we will update the target table by using these rows from the source table. In this way we do not update rows on the target which have not changed in the source table after the last synchronization. We will use the MERGE command to implement this data synchronization process:
After running this command for the first time in our test environment, all of the data in the source table will be inserted into the target table, because the target table is empty:
Now let’s do some updates in the source table:
After that let’s run the merge command again:
We will see that all changes made in source table are applied to the target and now the SourceTable and TargetTable tables are fully synchronized:
To automate this data synchronization process you can create a synchronization script, stored procedure or SSIS package that runs on a predefined schedule.
All in all, the rowversion (timestamp) data type is very useful to detect data changes. They are synonymous data types, however as we can see in this tip, there are some differences between choosing one of them during implementation. Also, Microsoft recommends to use rowversion instead of timestamp, because it will be deprecated in future versions. Using the rowversion data type to track changes can be the best solution, especially when other solutions are not applicable to the given task. This method is often used in loading data warehouses.