Databricks Certified · Data Engineer Associate

Master the Databricks DE Associate Exam

Deep-dive articles covering every exam topic — written to help you truly understand the platform, not just memorise answers.

20+
Topics covered
5
Exam sections
1/20
Published so far
Scroll to browse topics
1 / 20 Articles published
5% Complete
01 Databricks Intelligence Platform 3 topics
Live
Enable features that simplify data layout decisions and optimize query performance
Liquid Clustering, Predictive Optimization, Auto Optimize — and when to use each one.
Coming soon
Explain the value of the Data Intelligence Platform
What makes the Databricks platform different and why it matters for modern data engineering.
Coming soon
Identify the applicable compute to use for a specific use case
Clusters, SQL warehouses, serverless — choosing the right compute for every scenario.
02 Development and Ingestion 5 topics
Coming soon
Use Databricks Connect in a data engineering workflow
Develop and test Spark code locally while running on remote Databricks clusters.
Coming soon
Determine the capabilities of the Notebooks functionality
Magic commands, widgets, revision history, and multi-language notebooks.
Coming soon
Classify valid Auto Loader sources and use cases
Cloud storage triggers, schema inference, and when to use Auto Loader vs. COPY INTO.
Coming soon
Demonstrate knowledge of Auto Loader syntax
cloudFiles format, checkpointing, schema hints, and rescue data column.
Coming soon
Use Databricks' built-in debugging tools to troubleshoot a given issue
Spark UI, event logs, cluster logs, and the built-in debugger.
03 Data Processing & Transformations 6 topics
Coming soon
Describe the three layers of the Medallion Architecture
Bronze, Silver, Gold — purpose of each layer and how data flows through the pipeline.
Coming soon
Classify cluster type and configuration for optimal performance
All-purpose vs. job clusters, autoscaling, spot instances, and driver/worker sizing.
Coming soon
Advantages of Lakeflow Spark Declarative Pipelines for ETL
Why declarative pipelines beat hand-written Spark jobs for production ETL.
Coming soon
Implement data pipelines using Lakeflow Spark Declarative Pipelines
Streaming tables, materialized views, APPLY CHANGES, and pipeline syntax.
Coming soon
Identify DDL / DML features
CREATE, ALTER, DROP, MERGE, COPY INTO — Delta-specific DDL and DML you must know.
Coming soon
Compute complex aggregations and metrics with PySpark DataFrames
Window functions, groupBy, pivot, rollup, cube, and UDFs in PySpark.
04 Productionizing Data Pipelines 5 topics
Coming soon
Identify the difference between DAB and traditional deployment methods
Databricks Asset Bundles vs. manual deployments — CI/CD and version control.
Coming soon
Identify the structure of Asset Bundles
Bundle YAML, targets, resources, and how to organise a DAB project.
Coming soon
Deploy a workflow, repair, and rerun a task in case of failure
Job orchestration, task dependencies, repair runs, and parameter overrides.
Coming soon
Use serverless for hands-off, auto-optimized compute
When to use serverless compute, its benefits, and current limitations.
Coming soon
Analyse the Spark UI to optimize a query
Reading DAGs, identifying shuffles, spill, skew, and slow stages in the Spark UI.
05 Data Governance & Quality 10 topics
Coming soon
Explain the difference between managed and external tables
Data ownership, DROP behaviour, and when to choose each table type.
Coming soon
Identify the grant of permissions to users and groups within Unity Catalog
GRANT, REVOKE, privilege types, and how permissions flow through UC hierarchy.
Coming soon
Identify key roles in Unity Catalog
Metastore admin, catalog owner, data steward — and what each role can do.
Coming soon
Identify how audit logs are stored
System tables, audit log destinations, and querying audit events.
Coming soon
Use lineage features in Unity Catalog
Column-level and table-level lineage tracking across notebooks, jobs, and queries.
Coming soon
Use the Delta Sharing feature to share data
Shares, recipients, schemas — sharing live data securely without copying it.
Coming soon
Advantages and limitations of Delta Sharing
What Delta Sharing does well, and where it falls short compared to other methods.
Coming soon
Types of Delta Sharing: Databricks vs. external systems
D2D sharing vs. open protocol sharing — differences, setup, and use cases.
Coming soon
Analyse cost considerations of data sharing across clouds
Egress costs, storage costs, and how to minimise expenses when sharing cross-cloud.
Coming soon
Identify use cases of Lakehouse Federation
Querying external databases through Unity Catalog without moving the data.
☕ Found this helpful?
All articles are completely free — but a £3 coffee keeps me writing!
☕ Buy me a coffee
Buy me a coffee