For general information on types of data relationships, potential complications and how to resolve them, see the following articles:
- Data Relationships: Introduction to Many-to-Many
- Data Relationships: Testing for a Many-to-Many relationship
- Data Relationships: One-to-Many Relationship
- Data Relationships: The Invisible Many to Many (You are here)
- Data Relationships: Many-to-Many Relationship Resolutions
The Invisible Many to Many
You may find instances where you have common symptoms of a many to many (M2M) relationship, but cannot find the root cause of the relationship. Such symptoms of a many to many relationship include:
- High RAM usage during query,
- ElastiCube size on disk increasing during query,
- Build failing with 100% RAM
The common check for a M2M is the below query.
Select count(a.id), distinct_count(a.id) From dim as a
However, the numbers may be identical which may not initially indicate a M2M so the below query must be run for a stronger check:
Select count(a.ID), distinct_count(a.ID),
sum(case when a.ID is null then 1 else 0 end) as [null count]
From dim as a
If the null count field shows more than 1 in both tables then you have a many-to-many relationship on null values.
There could be many reasons why null values get to the dimension tables, such as cases when the dimension table is a custom table that's taken from the fact table or if the data is imported with null values from the data source itself.
- Dimension tables should always be unique, even if they have null values, there shouldn't be more than one null value.
- If the dimension table is a custom table that's taken from the fact table, make sure you use the "distinct" expression.
- If the nulls are imported from the data source try to remove them either in the import query or by creating a custom table that's taking only unique values from the table.