Azure Synapse Link for Dataverse Synapse and Fabric
This blog will look the capabilities and configuration of the Power Apps Azure Synapse Link for Dataverse. It will
·
Review each of the destinations
·
Explore the resulting data structure and supporting architecture.
·
Consider the options available for specifying the data structure.
·
Look at any issues or considerations which should be addressed during
implementation.
This is not a ‘How To’ document and will not describe the
steps required to implement the solution unless it is pertinent to the
explanation of a feature, issue or consideration that is being discussed. For
How-to information, I’m including these links.
·
Azure Data Lake Storage Gen2 See details here.
·
Azure Synapse Workspace. See details here.
·
Microsoft Fabric. See details here.
The link to Microsoft Fabric was in preview and the
above documentation link did not accurately describe the implementation steps
I encountered. This may have changed as the feature appears to have been moved to GA.
Data Structure
Different data configurations depend on how the link to
Dataverse is configured.
CSV Data
Each Dataverse table is represented by a folder containing
the files for each partition and data that represents the snapshot of the
table. Snapshots will be created in files and folders containing files for each
snapshot. Partition file names are either in the format YYYY.csv or YYYY-MM.csv.
Files are headless CSV Files
Figure 1 CSV File Structure |
A Snapshot will only be created if
·
The Append advanced option is selected.
·
The will default the partitions to Year.
·
The partition and ingestion strategy cannot be
changed once selected.
Delta Data
Each table is represented by a folder with the name
convention tablename_partitioned, Inside these table folders is a familiar
partitioned delta table structure; with the log folder and a folder for each
partition e.g. PartitionId = 2003.
Figure 2 Synapse File Structure Delta |
The structure in Fabric is different from Synapse in a couple of ways
Figure 3 Fabric Table File Structure |
The first thing to state is that change feed is
automatically added to the tables. As such the _change_data folder is present.
The second observation is that the partition files are stored in the default folder
[msft_datastate=__HIVE_DEFAULT_PARTITION__].
The partition is always Year, and the ingestion strategy is
always Append for both Delta Destinations.
Database Deployment
Linking the data to a storage account will not create a lake
house database.
Linking to Synapse or Fabric will create a database.
The metadata for the tables is also present.
name |
tableType |
isTemporary |
'email' |
‘EXTERNAL' |
False |
'account' |
‘EXTERNAL' |
False |
'contact' |
‘EXTERNAL' |
False |
'account_partitioned' |
‘EXTERNAL' |
False |
'email_partitioned' |
‘EXTERNAL' |
False |
'contact_partitioned' |
‘EXTERNAL' |
False |
'GlobalOptionsetMetadata' |
‘EXTERNAL' |
False |
'StateMetadata' |
‘EXTERNAL' |
False |
'OptionsetMetadata' |
‘EXTERNAL' |
False |
'StatusMetadata' |
‘EXTERNAL' |
False |
'TargetMetadata' |
‘EXTERNAL' |
False |
- GlobalOptionsetMetadata contains data for any D365 Global option set as it is applied against a given table
- StateMetadata contains the state options for each table.
- OptionsetMetadata contains the options for fields contained within the table that are of Choice data type
- StatusMetadata contains the status options for each table by state/status
- TargetMetadata contains the table type for each table field of the Lookup datatype.
Only those tables that have been selected for import will
have values within the Metadata.
As files are ingested as Lake Database tables the headless
state of the CSV files is accounted for and makes onward querying of the data
easy.
Each record has two timestamp values
·
SinkCreatedOn
·
SinkModifiedOn
These go beyond the Dataverse fields for maintaining D365 record
audit fields for in the system data.
- createdOn
- modifiedOn
The former would be used for Warehouse Effective Date and the latter would be used for Business Effective Date. Having both would provide a choice. One of D365s better qualities is the good audit control on the data extending it out to the Lake is another huge benefit.
Observations
Microsoft’s Christmas Elves have been busy so there a few
features which have emerged in the last month.
·
The Fabric Link now appears to be out of Preview
which offers a couple of additional features.
o
D365 Finance and Operations looks to be
available.
o
Metadata tables were not ingested but now are.
·
Synapse Delta Table link required a Spark Pool
with Apache Spark 3.1 Library. This has been changed to allow a Spark Pool with
Apache Spark 3.3 Library. This is in line with the Synapse runtime for Apache
Spark lifecycle policy.
o
The update can be implemented from the Power
Apps Azure Synapse Link Control Panel.
o
This update allows Change Data Capture to be executed.
Delta Table Properties
Synapse Delta Table
Tables are
External.
key |
value |
filescontainallcolumns |
TRUE |
option.escape |
" |
option.ignoreLeadingWhiteSpace |
TRUE |
option.ignoreTrailingWhiteSpace |
TRUE |
option.multiLine |
TRUE |
option.quote |
" |
option.timestampFormat |
M/dd/yyyy hh:mm:ss a |
Fabric Delta Table
Tables are Managed.
TBLPROPERTIES
key |
value |
delta.enableChangeDataFeed |
TRUE |
delta.minReaderVersion |
1 |
delta.minWriterVersion |
4 |
Query Data
Figure 4 Query Data in Synapse Notebooks |
Fabric queries the data more easily than Synapse as the lakehouse and notebook are more tightly coupled with each other.
Transitioning from the Data Export Service will probably take a bit of an adjustment, but it should be relatively simple to migrate. The big challenge in the Data Export Service was that the headless CSV files relied on a custom mechanism to strongly type the data and assign column values to the data. The inclusion of Lake Houses does the heavy lifting for you so extra transformations can be targeted at shaping the data for further analytical purposes.
Going into this with a fresh deployment its hard to see why you would not use the Azure Synapse Link for Dataverse. Whether you use Fabric or Azure Synapse Analytics depends on the choice of environment, but both offer similar capabilities.
Noticed that the database does not transfer as code in CI/CD. Whichever mechanism is used to create the database does not include it in the same way that the Database builder tool would.
ReplyDelete