Semantic Model Building Using Direct Lake

I for one was/am very excited about Direct Lake as a means of supplying data to Power BI in a meaningful live or near-live fashion. This has long been an issue for reporting and different platforms have dealt with this in different ways. Power BI has typically queried the data at source to avoid copying data into the Power BI model. It's likely if you have tried this you will have experience of where this has slowed down reporting. Direct Lake promises to resolve this by removing the database engine entirely and reading directly from One Lake Parquet Files.

For an overview of see here.

Within this document, there are a set of limitations. The first of which gave me pause to try to understand what it means and how it limits Direct Lake.

Known issues and limitations.

Currently, Direct Lake models can only contain tables and views from a single Lakehouse or Data Warehouse. However, tables in the model based on T-SQL-based views cannot be queried in Direct Lake mode. DAX queries that use these model tables fall back to DirectQuery mode.

At first glance, this seems to say that Views can be included in Direct Lake Models and contradicts to suggest that Views will not utilise Direct Lake and felt contradictory. I would still suggest that this needs to be reworded for clarity but let’s pick this apart:

1. The Tables and Views in a Direct Lake Model must come from a single source, Lakehouse or Data Warehouse.

2. Any Model Table that has as its source a View in the Lakehouse or Data Warehouse will not be queried in Direct Lake and instead will use Direct Query.

Direct Lake will read tables directly from the One Lake and Direct Query will read from the SQL Endpoint. Both will be live but the Views will be slower.

Thanks to Kay Unkroth at Microsoft for this clarification.

While researching and learning Fabric, I like many others looked to the learning path collections. Within on Module “Work with semantic models in Microsoft Fabric” there is an exercise, that provides steps for creating a Star Schema Model for Reseller Sales using Adventure Works.

The included steps are to import some tables.

· DimDate
·         DimProduct
·         DimProductCategory
·         DimProductSubcategory
·         DimReseller
·         DimSalesTerritory
·         FactResellerSales

Then perform some transformations to

1.     Merge dims to remove the snowflake pattern and create a Product Hierarchy.
2.     Remove Columns.
3.     Rename Columns
4.     Create a calculated columns.

The only step of these four that is possible in a Semantic Model in Fabric is the column rename step. Which begs the question, “Is it possible to satisfy this exercise as a Direct Lake Model?”.It feels like the answer is no or at least yes but the tables will fallback to Direct Query.

Consider these semantic models.

This first Model was created in Fabric from tables to allow Direct Lake Mode. It has all the base tables but does not allow any of the required transformations. We can hide fields in the model but they will be loaded. This is a shift from the advice that we would typically implement on a model where we would not include columns and data that we are not going to use in the model.

a semantic model with tables and a snowflaked dimension

Sematic Model in Direct Lake Mode

So what options do we have to replicate the model described in the exercise? My first though was this model created in Fabric.

a semantic model with 5 tables in a star schema

A Semantic Model Using Views in Direct Lake

This though relies on views so will fallback to Direct Query Model

CREATE VIEW [dbo].[Product]

SELECT

ProductKey

,EnglishProductName Product

,dPS.ProductSubcategoryKey

,EnglishProductCategoryName Category

,Color

,EnglishProductSubcategoryName SubCategory

FROM DimProduct dP

Join DimProductSubcategory dPS

ON dP.ProductSubcategoryKey = dPS.ProductSubcategoryKey

Join DimProductCategory dPC

ON dPC.ProductCategoryKey = dPS.ProductCategoryKey

WHERE [FinishedGoodsFlag] = 1

My next thought was try a more traditional approach such as this created in Power BI. Has no option for Direct Lake Mode. It can be edited in Fabric but not converted to Direct Lake.

A Power BI Model in Direct Query Mode

So, what options do we have in reality?

Dimension versus Fact.

I think that in most instances we can probably stomach the loss and accept Direct Query fall back for our dimensions. The possible exception for this would be the Date Dimension. I suspect in most instances the Date Dimension will not be a wide as this example in Adventure Works and that it will typically be created to satisfy the reporting models required. As such I’d probably just load it in its table form. For something like the Product Dimension with Hierarchy in the exercise example would often be small enough to stomach any loss of speed and a view would be the way to go. If this is not the case then the mantra of “Push Left” holds and the transformation of data would result in a denormalised dimension in the Lakehouse or Data Warehouse.

For Facts, this is less cut and dried. We would expect to see multiple versions of the same data in multiple Facts. This though is normally driven by the Grain. So date or aggregation at different levels. This though feels more like audience-driven field selection. The Fact in the exercise drops from 27 to 6 columns, and we lose a lot of detail on discounts, tax, employees, promotion, and other intelligence. Maybe this exercise fact is unrealistic, and a wider model might be produced but again we want a slim and fleet model, right? Given so much thought to preventing duplication in other areas it does feel as though the last thing we would want to do is create 10 different Facts in the same grain but with different fields.

I’m sure this will be a long-running discussion and I will be giving this a good deal of thought. Direct Lake will no doubt provide a lot of value but promises to require careful consideration and planning of our data schema.

Search This Blog

Lost in Data Caught in the Trace