Databricks DE Associate

Section 01

01 Databricks Intelligence Platform 3 topics

Live

Enable features that simplify data layout decisions and optimize query performance

Liquid Clustering, Predictive Optimization, Auto Optimize — and when to use each one.

→

Live

Explain the value of the Data Intelligence Platform

What makes the Databricks platform different and why it matters for modern data engineering.

→

Live

Identify the applicable compute to use for a specific use case

Clusters, SQL warehouses, serverless — choosing the right compute for every scenario.

→

Section 02

02 Development and Ingestion 5 topics

Live

Use Databricks Connect in a data engineering workflow

Develop and test Spark code locally while running on remote Databricks clusters.

→

Live

Determine the capabilities of the Notebooks functionality

Magic commands, widgets, revision history, and multi-language notebooks.

Live

Classify valid Auto Loader sources and use cases

Cloud storage triggers, schema inference, and when to use Auto Loader vs. COPY INTO.

Live

Demonstrate knowledge of Auto Loader syntax

cloudFiles format, checkpointing, schema hints, and rescue data column.

Live

Use Databricks' built-in debugging tools to troubleshoot a given issue

Spark UI, event logs, cluster logs, and the built-in debugger.

Section 03

03 Data Processing & Transformations 6 topics

Live

Describe the three layers of the Medallion Architecture

Bronze, Silver, Gold — purpose of each layer and how data flows through the pipeline.

→

Live

Classify cluster type and configuration for optimal performance

All-purpose vs. job clusters, autoscaling, spot instances, and driver/worker sizing.

→

Live

Advantages of Lakeflow Spark Declarative Pipelines for ETL

Why declarative pipelines beat hand-written Spark jobs for production ETL.

→

Coming soon

Implement data pipelines using Lakeflow Spark Declarative Pipelines

Streaming tables, materialized views, APPLY CHANGES, and pipeline syntax.

Coming soon

Identify DDL / DML features

CREATE, ALTER, DROP, MERGE, COPY INTO — Delta-specific DDL and DML you must know.

Coming soon

Compute complex aggregations and metrics with PySpark DataFrames

Window functions, groupBy, pivot, rollup, cube, and UDFs in PySpark.

Section 04

04 Productionizing Data Pipelines 5 topics

Coming soon

Identify the difference between DAB and traditional deployment methods

Databricks Asset Bundles vs. manual deployments — CI/CD and version control.

Coming soon

Identify the structure of Asset Bundles

Bundle YAML, targets, resources, and how to organise a DAB project.

Coming soon

Deploy a workflow, repair, and rerun a task in case of failure

Job orchestration, task dependencies, repair runs, and parameter overrides.

Coming soon

Use serverless for hands-off, auto-optimized compute

When to use serverless compute, its benefits, and current limitations.

Coming soon

Analyse the Spark UI to optimize a query

Reading DAGs, identifying shuffles, spill, skew, and slow stages in the Spark UI.

Section 05

05 Data Governance & Quality 10 topics

Coming soon

Explain the difference between managed and external tables

Data ownership, DROP behaviour, and when to choose each table type.

Coming soon

Identify the grant of permissions to users and groups within Unity Catalog

GRANT, REVOKE, privilege types, and how permissions flow through UC hierarchy.

Coming soon

Identify key roles in Unity Catalog

Metastore admin, catalog owner, data steward — and what each role can do.

Coming soon

Identify how audit logs are stored

System tables, audit log destinations, and querying audit events.

Coming soon

Use lineage features in Unity Catalog

Column-level and table-level lineage tracking across notebooks, jobs, and queries.

Coming soon

Use the Delta Sharing feature to share data

Shares, recipients, schemas — sharing live data securely without copying it.

Coming soon

Advantages and limitations of Delta Sharing

What Delta Sharing does well, and where it falls short compared to other methods.

Coming soon

Types of Delta Sharing: Databricks vs. external systems

D2D sharing vs. open protocol sharing — differences, setup, and use cases.

Coming soon

Analyse cost considerations of data sharing across clouds

Egress costs, storage costs, and how to minimise expenses when sharing cross-cloud.

Coming soon

Identify use cases of Lakehouse Federation

Querying external databases through Unity Catalog without moving the data.

Master the Databricks DE Associate Exam