Implement performance improvements

1. Implement performance improvements

In this video, we'll talk about strategies to improve performance of individual Fabric items or the entire capacity.

2. Optimization strategies

There are various strategies that can be pursued to optimize performance. You can optimize the design of individual Fabric items. For example, avoiding unnecessary transformations in a dataflow will use less resources. At a capacity level, you can also scale up to increase capacity, or scale out to distribute workloads between multiple capacities.

3. Optimizing SQL

When dealing with a Warehouse, it is important to use the most optimal SQL possible. In general, you should limit the number of columns and operations in your queries.

4. Optimizing SQL

You should also use the smallest data types possible to reduce the resources required by queries.

5. Optimizing SQL

In addition, minimize joins by using star schemas.

6. Optimizing SQL

Finally, use the system views like sys.dm_exec_requests to get information about running queries.

7. Optimizing Notebooks

The Monitoring Hub has a link to the the Spark History Server, a UI that provides information about the execution of Spark jobs. Use it to understand the resources being used by a notebook and the query plan.

8. Optimizing Notebooks

The default Spark session expiration time is set to 20 minutes. When you have many users running interactive jobs, all active sessions take up Capacity Units. To reduce impact, it is recommended to stop active sessions when not in use.

9. Optimizing Notebooks

Join operations in Spark are resource intensive. Whenever possible, reduce the amount of data in memory by selecting only the columns relevant to the join.

10. Optimizing dataflows

When optimizing dataflows, aim to reduce expensive or unnecessary tasks. For example, sorting data is an expensive operation so you should use it only if necessary.

11. Optimizing dataflows

You should also try to achieve query folding whenever possible. Query folding occurs when transformations are offloaded to the data source. If these transformations occur at the data source, there will be less data traveling to the Power Query engine.

12. Optimizing dataflows

You can also consider whether to enable or disable Staging. Power Query uses an internal lakehouse to stage data before transforming it. Staging can be beneficial for large data loads; however, for smaller data loads, the overhead of moving data to and from staging might affect performance.

13. Scale Up

Scaling up involves changing to a higher SKU to provide more computing resources to all the items in the capacity.

14. Scale Up

In this example, the capacity hosts three workspaces and uses the F64 SKU, which has an equivalent computing power of 8 virtual cores. We scale up to the F128 SKU, which has an equivalent computing power of 16 cores.

15. Scale Out

Scaling out involves moving some of the workspaces or items to a different Fabric capacity to spread the workload. Isolating workloads can be useful when separating non-production from production workloads, or when a particular part of the business requires higher priority. For example, you might want to have a dedicated capacity for executive reporting, so the executive users have their own computing power separate from the Fabric capacity used by data scientists and other teams.

16. Scale Out

In this example, the capacity hosts three workspaces. We scale out by creating a second capacity and moving one of the workspaces to it.

17. Let's practice!

Now, let's do some exercises to practice optimization techniques.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.