Semantic Model Building Using Direct Lake
I for one was/am very excited about Direct Lake as a means of supplying data to Power BI in a meaningful live or near-live fashion. This has long been an issue for reporting and different platforms have dealt with this in different ways. Power BI has typically queried the data at source to avoid copying data into the Power BI model. It's likely if you have tried this you will have experience of where this has slowed down reporting. Direct Lake promises to resolve this by removing the database engine entirely and reading directly from One Lake Parquet Files.
For an
overview of see here.
Within this document, there are a set of limitations. The first of which gave me pause to try to understand what it means and how it limits Direct Lake.
Known issues and limitations.
|
At first glance, this seems to
say that Views can be included in Direct Lake Models and contradicts to suggest
that Views will not utilise Direct Lake and felt contradictory. I would still
suggest that this needs to be reworded for clarity but let’s pick this apart:
1. The Tables and Views in a Direct
Lake Model must come from a single source, Lakehouse or Data Warehouse.
2. Any Model Table that has as its
source a View in the Lakehouse or Data Warehouse will not be queried in Direct
Lake and instead will use Direct Query.
Direct Lake will read tables directly
from the One Lake and Direct Query will read from the SQL Endpoint. Both will
be live but the Views will be slower.
Thanks to Kay Unkroth at Microsoft for this clarification.
While researching and learning Fabric,
I like many others looked to the learning path collections. Within on Module “Work
with semantic models in Microsoft Fabric” there is an exercise,
that provides steps for creating a Star Schema Model for Reseller Sales using
Adventure Works.
The included steps are to import some
tables.
· DimProduct
· DimProductCategory
· DimProductSubcategory
· DimReseller
· DimSalesTerritory
· FactResellerSales
Then perform some transformations
to
2. Remove Columns.
3. Rename Columns
4. Create a calculated columns.
The only step of these four that
is possible in a Semantic Model in Fabric is the column rename step. Which begs
the question, “Is it possible to satisfy this exercise as a Direct Lake Model?”.It
feels like the answer is no or at least yes but the tables will fallback to
Direct Query.
Consider these semantic models.
This first
Model was created in Fabric from tables to allow Direct Lake Mode. It has all
the base tables but does not allow any of the required transformations. We can
hide fields in the model but they will be loaded. This is a shift from the
advice that we would typically implement on a model where we would not include
columns and data that we are not going to use in the model.
Sematic Model in Direct Lake Mode |
So what options do we have to replicate the model described in the exercise? My first though was this model created in Fabric.
A Semantic Model Using Views in Direct Lake |
This though
relies on views so will fallback to Direct Query Model
CREATE VIEW
[dbo].[Product] AS SELECT ProductKey ,EnglishProductName
Product ,dPS.ProductSubcategoryKey ,EnglishProductCategoryName
Category ,Color ,EnglishProductSubcategoryName
SubCategory FROM DimProduct dP Join DimProductSubcategory dPS ON
dP.ProductSubcategoryKey = dPS.ProductSubcategoryKey
Join DimProductCategory dPC ON
dPC.ProductCategoryKey = dPS.ProductCategoryKey WHERE [FinishedGoodsFlag] = 1 GO |
My next thought was try a more traditional approach such as this created in Power BI. Has no option for Direct Lake Mode. It can be edited in Fabric but not converted to Direct Lake.
A Power BI Model in Direct Query Mode |
So, what options do we have in reality?
Dimension versus Fact.
I think that in most instances we can probably stomach the loss and
accept Direct Query fall back for our dimensions. The possible exception for
this would be the Date Dimension. I suspect in most instances the Date
Dimension will not be a wide as this example in Adventure Works and that it
will typically be created to satisfy the reporting models required. As such I’d
probably just load it in its table form. For something like the Product Dimension
with Hierarchy in the exercise example would often be small enough to stomach
any loss of speed and a view would be the way to go. If this is not the case
then the mantra of “Push Left” holds and the transformation of data would
result in a denormalised dimension in the Lakehouse or Data Warehouse.
For Facts, this is less cut and dried. We would expect to see multiple
versions of the same data in multiple Facts. This though is normally driven by
the Grain. So date or aggregation at different levels. This though feels more
like audience-driven field selection. The Fact in the exercise drops from 27 to
6 columns, and we lose a lot of detail on discounts, tax, employees, promotion,
and other intelligence. Maybe this exercise fact is unrealistic, and a wider
model might be produced but again we want a slim and fleet model, right? Given
so much thought to preventing duplication in other areas
it does feel as though the last thing we would want to do is create 10
different Facts in the same grain but with different fields.
I’m sure this will be a long-running discussion and I will be giving
this a good deal of thought. Direct Lake will no doubt provide a lot of value but
promises to require careful consideration and planning of our data schema.
Comments
Post a Comment