Azure Synapse Link for Dataverse Synapse and Fabric

 This blog will look the capabilities and configuration of the Power Apps Azure Synapse Link for Dataverse. It will

·        Review each of the destinations

·        Explore the resulting data structure and supporting architecture.

·        Consider the options available for specifying the data structure.

·        Look at any issues or considerations which should be addressed during implementation.

This is not a ‘How To’ document and will not describe the steps required to implement the solution unless it is pertinent to the explanation of a feature, issue or consideration that is being discussed. For How-to information, I’m including these links.

·        Azure Data Lake Storage Gen2 See details here.

·        Azure Synapse Workspace. See details here.

·        Microsoft Fabric. See details here.

The link to Microsoft Fabric was in preview and the above documentation link did not accurately describe the implementation steps I encountered.  This may have changed as the feature appears to have been moved to GA.

Data Structure

Different data configurations depend on how the link to Dataverse is configured.

CSV Data

Each Dataverse table is represented by a folder containing the files for each partition and data that represents the snapshot of the table. Snapshots will be created in files and folders containing files for each snapshot. Partition file names are either in the format YYYY.csv or YYYY-MM.csv. Files are headless CSV Files

Tree Structure of Data and Folders. Synapse Analytics Tables CSV Structure.
Figure 1 CSV File Structure


A Snapshot will only be created if

·        The Append advanced option is selected.

·        The will default the partitions to Year.

·        The partition and ingestion strategy cannot be changed once selected.

Delta Data

Each table is represented by a folder with the name convention tablename_partitioned, Inside these table folders is a familiar partitioned delta table structure; with the log folder and a folder for each partition e.g. PartitionId = 2003.

Tree Structure of Data and Folders. Synapse Analytics Tables Delta Stucture.
Figure 2 Synapse File Structure Delta


The structure in Fabric is different from Synapse in a couple of ways

Tree Structure of Data and Folders. Fabric Tables Delta Stucture.
Figure 3 Fabric Table File Structure



The first thing to state is that change feed is automatically added to the tables. As such the _change_data folder is present. The second observation is that the partition files are stored in the default folder [msft_datastate=__HIVE_DEFAULT_PARTITION__].

The partition is always Year, and the ingestion strategy is always Append for both Delta Destinations.

Database Deployment

Linking the data to a storage account will not create a lake house database.

Linking to Synapse or Fabric will create a database.

The metadata for the tables is also present.

name

tableType

isTemporary

'email'

‘EXTERNAL'

False

'account'

‘EXTERNAL'

False

'contact'

‘EXTERNAL'

False

'account_partitioned'

‘EXTERNAL'

False

'email_partitioned'

‘EXTERNAL'

False

'contact_partitioned'

‘EXTERNAL'

False

'GlobalOptionsetMetadata'

‘EXTERNAL'

False

'StateMetadata'

‘EXTERNAL'

False

'OptionsetMetadata'

‘EXTERNAL'

False

'StatusMetadata'

‘EXTERNAL'

False

'TargetMetadata'

‘EXTERNAL'

False

 

  • GlobalOptionsetMetadata contains data for any D365 Global option set as it is applied against a given table
  • StateMetadata contains the state options for each table.
  • OptionsetMetadata contains the options for fields contained within the table that are of Choice data type
  • StatusMetadata contains the status options for each table by state/status
  • TargetMetadata contains the table type for each table field of the Lookup datatype.


Only those tables that have been selected for import will have values within the Metadata.

As files are ingested as Lake Database tables the headless state of the CSV files is accounted for and makes onward querying of the data easy.

Each record has two timestamp values

·        SinkCreatedOn

·        SinkModifiedOn

These go beyond the Dataverse fields for maintaining D365 record audit fields for in the system data.

  • createdOn
  • modifiedOn

The former would be used for Warehouse Effective Date and the latter would be used for Business Effective Date. Having both would provide a choice.  One of D365s better qualities is the good audit control on the data extending it out to the Lake is another huge benefit.

Observations

Microsoft’s Christmas Elves have been busy so there a few features which have emerged in the last month.

·        The Fabric Link now appears to be out of Preview which offers a couple of additional features.

o   D365 Finance and Operations looks to be available.

o   Metadata tables were not ingested but now are.

·        Synapse Delta Table link required a Spark Pool with Apache Spark 3.1 Library. This has been changed to allow a Spark Pool with Apache Spark 3.3 Library. This is in line with the Synapse runtime for Apache Spark lifecycle policy.

o   The update can be implemented from the Power Apps Azure Synapse Link Control Panel.

o   This update allows Change Data Capture to be executed.

Delta Table Properties

Synapse Delta Table

Tables are External.

TBLPROPERTIES

key

value

filescontainallcolumns

TRUE

option.escape

"

option.ignoreLeadingWhiteSpace

TRUE

option.ignoreTrailingWhiteSpace

TRUE

option.multiLine

TRUE

option.quote

"

option.timestampFormat

M/dd/yyyy hh:mm:ss a

 

Fabric Delta Table

Tables are Managed.

TBLPROPERTIES

key

value

delta.enableChangeDataFeed

TRUE

delta.minReaderVersion

1

delta.minWriterVersion

4

 

Query Data

The data and optionset data can be queried easily to provide useful denormalized datasets
SQL Query and Results in a Jupyter Notebook in Synapse
Figure 4 Query Data in Synapse Notebooks

Fabric queries the data more easily than Synapse as the lakehouse and notebook are more tightly coupled with each other.

SQL Query and Results in a Fabric Notebook
Figure 5 Query Results in a Fabric Notebook

Conclusion

Transitioning from the Data Export Service will probably take a bit of an adjustment, but it should be relatively simple to migrate. The big challenge in the Data Export Service was that the headless CSV files relied on a custom mechanism to strongly type the data and assign column values to the data. The inclusion of Lake Houses does the heavy lifting for you so extra transformations can be targeted at shaping the data for further analytical purposes.

Going into this with a fresh deployment its hard to see why you would not use the Azure Synapse Link for Dataverse. Whether you use Fabric or Azure Synapse Analytics depends on the choice of environment, but both offer similar capabilities.



Comments

  1. Noticed that the database does not transfer as code in CI/CD. Whichever mechanism is used to create the database does not include it in the same way that the Database builder tool would.

    ReplyDelete

Post a Comment

Popular posts from this blog

Semantic Model Building Using Direct Lake

So, You Want to Pass the DP-600 Exam.