DP-900 Practice

DP-900 Practice Questions: 15 Free Questions With Detailed Answers

Scenario-based DP-900 practice questions covering all four exam domains — core data concepts, relational data, non-relational data, and analytics workloads — with full explanations.

By MSCertQuiz TeamUpdated March 202615 min read

About These Questions

These 10 questions (with 5 more below) are written to match the style of the real DP-900 exam. They test your ability to apply concepts to realistic scenarios — not just recall definitions. Attempt each question before revealing the explanation.

Question 1Core Data ConceptsEasy

A manufacturing company collects temperature readings from 10,000 IoT sensors every second. The data is analyzed immediately to detect equipment failures in real-time. Which data processing approach describes this scenario?

A. Batch processing — data is collected and processed in scheduled intervals
B. Stream processing — data is processed continuously as it arrives
C. ETL processing — data is extracted, transformed, and loaded into a data warehouse
D. OLAP processing — data is aggregated for analytical queries
Reveal Answer & Explanation

Correct Answer: B

Stream processing handles data continuously as it arrives, enabling real-time analysis. IoT sensor data that must be analyzed "immediately" to detect failures is a classic stream processing scenario. Batch processing (A) processes data in scheduled intervals — the opposite of real-time. ETL (C) is a data pipeline pattern, not a processing timing approach. OLAP (D) refers to analytical query patterns, not the timing of processing.

Question 2Core Data ConceptsMedium

A data team is designing a data storage solution. They need to store data for operational transaction processing (OLTP) and separate storage for analytical reporting (OLAP). Which statement correctly describes the difference between OLTP and OLAP workloads?

A. OLTP is optimized for complex analytical queries across large datasets; OLAP is optimized for fast, simple transactions
B. OLTP is optimized for fast insert/update/delete of individual records; OLAP is optimized for complex queries aggregating large volumes of historical data
C. OLTP uses columnar storage; OLAP uses row-based storage
D. OLTP supports only read operations; OLAP supports read and write operations
Reveal Answer & Explanation

Correct Answer: B

OLTP (Online Transaction Processing) workloads involve frequent, fast insert/update/delete operations on individual records — like order processing or banking transactions. OLAP (Online Analytical Processing) workloads involve complex queries that aggregate large volumes of historical data for reporting — like monthly sales analysis. The options in A reverse these definitions. Storage format (C) is reversed — OLAP typically uses columnar storage, not OLTP. D is incorrect as OLTP involves heavy write operations.

Question 3Relational DataMedium

Your organization runs a SQL Server 2019 database on-premises. The operations team wants to migrate to Azure with minimal code changes and maximum compatibility with existing SQL Server features including SQL Agent jobs, cross-database queries, and CLR integration. Which Azure service should you recommend?

A. Azure SQL Database (single database)
B. Azure SQL Managed Instance
C. Azure Database for PostgreSQL
D. Azure Cosmos DB with the NoSQL API
Reveal Answer & Explanation

Correct Answer: B

Azure SQL Managed Instance provides near 100% compatibility with SQL Server, including SQL Agent jobs, cross-database queries, CLR integration, and linked servers. These features are NOT available in Azure SQL Database (A), which is a PaaS service with a subset of SQL Server features. Azure Database for PostgreSQL (C) is a different database engine. Cosmos DB (D) is a NoSQL database — incompatible with SQL Server workloads.

Question 4Relational DataEasy

A developer needs to retrieve all customer names and their order totals from an Orders table, sorted by total amount descending, for customers who have spent more than $1,000. Which SQL statement correctly achieves this?

A. SELECT CustomerName, SUM(Amount) FROM Orders WHERE SUM(Amount) > 1000 ORDER BY SUM(Amount) DESC
B. SELECT CustomerName, SUM(Amount) AS Total FROM Orders GROUP BY CustomerName HAVING SUM(Amount) > 1000 ORDER BY Total DESC
C. SELECT CustomerName, Total FROM Orders WHERE Total > 1000 GROUP BY CustomerName
D. SELECT CustomerName, SUM(Amount) FROM Orders HAVING SUM(Amount) > 1000 GROUP BY CustomerName
Reveal Answer & Explanation

Correct Answer: B

The correct SQL uses GROUP BY to aggregate orders per customer, HAVING to filter on the aggregated result (cannot use WHERE for aggregate functions), and ORDER BY for sorting. A is incorrect — WHERE cannot filter on aggregate functions. C references a "Total" column that doesn't exist and GROUP BY must come before ORDER BY when using aggregates. D is syntactically invalid — HAVING must come after GROUP BY.

Question 5Non-Relational DataMedium

A social network application needs to find all users who are "friends of friends" within 3 degrees of a given user. The recommendation engine must traverse relationships between users efficiently. Which Azure Cosmos DB API should you use?

A. Cosmos DB for NoSQL — store friend relationships as JSON arrays in user documents
B. Cosmos DB for Cassandra — store friend relationships in wide-column format
C. Cosmos DB for Gremlin — store users as vertices and friendships as edges in a graph
D. Cosmos DB for Table — store friend relationships as key-value pairs
Reveal Answer & Explanation

Correct Answer: C

Graph databases are purpose-built for traversing relationships. Cosmos DB for Gremlin (Apache TinkerPop) stores entities as vertices and relationships as edges, making "find friends of friends within N degrees" a natural graph traversal query. NoSQL (A) would require complex application-level traversal logic and wouldn't scale for deep relationship queries. Cassandra (B) is a column-family store optimized for time-series and tabular data. Table API (D) is key-value — no relationship traversal capability.

Question 6Non-Relational DataMedium

A company stores infrequently accessed archive data in Azure Blob Storage. The data must be retained for 7 years for compliance, is accessed less than once per year, and cost reduction is the highest priority. Which access tier should you use?

A. Hot tier — for frequently accessed data
B. Cool tier — for infrequently accessed data stored for at least 30 days
C. Cold tier — for rarely accessed data stored for at least 90 days
D. Archive tier — for rarely accessed data stored for at least 180 days with retrieval latency
Reveal Answer & Explanation

Correct Answer: D

Archive tier offers the lowest storage cost and is designed for data accessed less than once per year, stored for 180+ days. The 7-year retention and "less than once per year" access clearly point to Archive. The tradeoff is retrieval latency (hours, not seconds) — acceptable for compliance archives. Hot (A) is for frequent access. Cool (B) requires 30 days minimum — cheaper than Hot but more expensive than Archive. Cold (C) is between Cool and Archive on the cost/access spectrum.

Question 7Non-Relational DataHard

A developer is choosing a Cosmos DB API for an existing application that uses MongoDB drivers and queries. The company wants to migrate to Azure with minimal code changes and wants global distribution capabilities. Which API should they choose?

A. Cosmos DB for NoSQL — it supports JSON documents similar to MongoDB
B. Cosmos DB for MongoDB — wire-compatible with MongoDB protocols and drivers
C. Cosmos DB for Cassandra — it supports column-family data with high performance
D. Cosmos DB for Table — it provides key-value storage with simple access patterns
Reveal Answer & Explanation

Correct Answer: B

Cosmos DB for MongoDB provides wire-level compatibility with MongoDB — existing MongoDB drivers, connection strings, and queries work without code changes. This is the explicit use case Microsoft designed it for: migrating MongoDB applications to Azure with global distribution benefits. Cosmos DB for NoSQL (A) stores JSON but uses a different query syntax — MongoDB drivers won't work. Cassandra (C) and Table (D) are completely different data models.

Question 8Analytics WorkloadsMedium

A data engineering team needs to copy data from an on-premises Oracle database to Azure Data Lake Storage on a nightly schedule. They also need to transform the data using Spark before loading it into Azure Synapse Analytics. Which Azure service is primarily responsible for the orchestration and scheduling of this pipeline?

A. Azure Synapse Analytics — it can ingest, transform, and store data in a single service
B. Azure Data Factory — it orchestrates data movement and pipeline scheduling between services
C. Microsoft Fabric — it provides a unified analytics platform for the entire workflow
D. Azure Stream Analytics — it processes and routes data between services in real-time
Reveal Answer & Explanation

Correct Answer: B

Azure Data Factory (ADF) is Microsoft's cloud data integration service specifically designed to orchestrate data movement between sources and destinations on schedules. It can connect to Oracle, trigger Spark transformations, and load into Synapse. Azure Synapse (A) has pipeline capabilities but is primarily the analytics destination, not the orchestrator for cross-system pipelines from external sources. Microsoft Fabric (C) is a newer unified platform but ADF is the classic answer for pipeline orchestration in DP-900. Azure Stream Analytics (D) handles real-time streaming data, not scheduled batch pipelines.

Question 9Analytics WorkloadsMedium

A business intelligence team creates dashboards that display real-time KPIs. Multiple reports from different datasets are pinned to a single view that managers check each morning. Some visuals are from a sales report, others from a HR report. What Power BI component correctly describes this combined view?

A. Report — a collection of visualizations from a single dataset
B. Dataset — the data model that powers visualizations
C. Dashboard — a single-page canvas with pinned tiles from multiple reports and datasets
D. Dataflow — a cloud ETL process that prepares data for reports
Reveal Answer & Explanation

Correct Answer: C

A Power BI Dashboard is a single-page canvas where you pin individual visuals from multiple reports and datasets. This is exactly the scenario described — visuals from a sales report and HR report combined into one view. A Report (A) is based on a single dataset and can have multiple pages, but cannot combine visuals from different datasets. A Dataset (B) is the data model, not a visual artifact. A Dataflow (D) is a data preparation/ETL construct, not a display component.

Question 10Analytics WorkloadsHard

A company wants to analyze petabyte-scale data using both SQL queries and Apache Spark in a single workspace. They need the ability to query data directly in Azure Data Lake Storage without loading it into a database first. Which Azure service best fits this requirement?

A. Azure SQL Database — it supports large-scale analytical queries with columnstore indexes
B. Azure Synapse Analytics — it provides SQL pools, serverless SQL, and Spark pools in a unified workspace
C. Azure Databricks — it provides managed Apache Spark for big data processing
D. Power BI Premium — it supports large datasets with incremental refresh
Reveal Answer & Explanation

Correct Answer: B

Azure Synapse Analytics uniquely provides: dedicated SQL pools (provisioned DW), serverless SQL (query data lake files directly without loading), and Apache Spark pools — all in one unified workspace. The ability to query data "directly in Azure Data Lake Storage without loading" refers specifically to Synapse serverless SQL. Azure SQL Database (A) is not designed for petabyte-scale analytics. Azure Databricks (C) provides Spark but not the serverless SQL-on-data-lake capability. Power BI Premium (D) is a BI tool, not an analytics compute service.

Want 500 More DP-900 Questions?

Full question bank covering all 4 DP-900 domains with practice mode and timed exam simulation.

40 questions free — no credit card required.

Start Free Practice →

DP-900 Exam Question Patterns

Understanding how the exam frames questions helps you answer faster and more accurately. Here are the most common DP-900 question patterns:

"Which service should you use?" scenarios

The most common question type. A scenario describes a business requirement, and you must select the correct Azure data service. Master the differences between Cosmos DB, Azure SQL, Synapse, ADF, Databricks, and Fabric to nail these.

API selection for Cosmos DB

Given a data type or migration scenario, which Cosmos DB API? NoSQL for new document apps, MongoDB for existing MongoDB apps, Gremlin for graph, Cassandra for column-family, Table for key-value migration.

Blob Storage access tier selection

Given access frequency and retention requirements, which access tier? Hot (frequent), Cool (30+ days, infrequent), Cold (90+ days, rare), Archive (180+ days, very rare, cheapest).

Power BI component identification

Distinguish between datasets, reports, dashboards, and dataflows. Know which is the data model, which is the visual layer, and which combines multiple reports.

Related Resources