AI-103 (Developing AI Apps and Agents on Azure) is Microsoft's current beta exam for Azure AI Engineer Associate — and the most architecturally demanding Azure AI exam to date. Generative AI and agentic solutions make up 35–40% of the exam, with the remaining weight spread across responsible AI planning, computer vision, text analysis, and information extraction.
Because AI-103 is a beta exam, free practice resources are extremely limited — most "AI-103 free questions" online are recycled AI-102 content that misses the new Foundry, agent, and RAG focus. The 20 questions below mirror the real beta exam's scenario-based format and difficulty, with detailed architectural explanations for every answer.
What you'll get:
- ✓20 scenario-based questions across all 5 AI-103 domains
- ✓Real beta exam difficulty — Foundry agents, RAG, Document Intelligence vs. Content Understanding, RBAC distinctions
- ✓Detailed explanations covering the architectural reasoning behind each correct answer
- ✓Coverage of new 2026 topics — integrated vectorization, Foundry agents with tools, Content Understanding, prompt flow
What These Questions Cover
📝 Practice Test Instructions
- • Each question has ONE best answer — choose the option that satisfies ALL stated requirements
- • AI-103 scenarios often present two architecturally plausible answers that differ on one specific constraint — read every word
- • Note your answers before scrolling to the answer key
- • Aim to complete all 20 questions in 30 minutes (real exam: 120 minutes for 40–60 questions)
Plan and Manage an Azure AI Solution
Questions 1–3
A backend microservice needs to call your Azure OpenAI deployment to generate chat completions. It must not be able to view, modify, or delete deployments, and it must not be able to read the resource keys. You want the least-privilege role.
Which Azure RBAC role should you assign to the microservice's managed identity?
- A.Cognitive Services OpenAI User — grants permission to call inference endpoints but not to manage deployments or read keys
- B.Cognitive Services OpenAI Contributor — required to call the inference endpoint at all
- C.Contributor — necessary to use the managed identity against the resource
- D.Cognitive Services User — sufficient for any Azure AI service including Azure OpenAI
Correct: A. Cognitive Services OpenAI User is the least-privilege role for inference. It permits chat completions and embeddings calls without granting visibility into keys or deployment management. Cognitive Services OpenAI Contributor adds deployment management rights and is over-privileged for inference-only workloads. Contributor and Cognitive Services User are both broader than needed.
Your application generates marketing copy with Azure OpenAI. You must block any response that contains medium or higher severity violent content, while allowing low-severity violence (e.g., metaphors like "killer deal") to pass through. All other content categories should use default Microsoft-recommended settings.
How should you configure the content filter on the deployment?
- A.Disable the violence category entirely and rely on Azure AI Content Safety post-processing
- B.Set the violence output filter to block at medium severity, leaving hate, self-harm, and sexual at default settings
- C.Set the violence input filter to block at medium severity, leaving the output filter at default
- D.Enable prompt shields for direct jailbreaks and rely on those to block violent output
Correct: B. Content filters apply per category (hate, self-harm, sexual, violence) at four severity levels (safe, low, medium, high), and can be configured separately for prompts and completions. Blocking violence output at medium severity blocks both medium and high while allowing low severity through, satisfying the requirement. Input filters apply to user prompts, not generated responses. Prompt shields are for jailbreak detection, not severity-based content classification.
Your Azure AI Foundry hub must be reachable only from a specific virtual network. Outbound traffic from the hub to Azure OpenAI and Azure AI Search must also flow over the private network, not the public internet. Microsoft Entra ID authentication is required throughout.
Which network configuration achieves this?
- A.Enable public network access on the Foundry hub and restrict the firewall to the corporate IP range
- B.Configure the Foundry hub for managed virtual network isolation with the "Allow Internet Outbound" preset
- C.Configure the Foundry hub for managed virtual network isolation with the "Allow Only Approved Outbound" preset, and create private endpoints to Azure OpenAI and Azure AI Search
- D.Use NSGs on the VNet to block outbound traffic except to Microsoft public IPs
Correct: C. Foundry managed virtual network isolation with "Allow Only Approved Outbound" denies all egress by default and requires explicit private endpoint outbound rules to each dependency (Azure OpenAI, Azure AI Search, storage). This satisfies both inbound and outbound private-network requirements. Public network access defeats the requirement. "Allow Internet Outbound" permits egress to the public internet. NSG-only solutions cannot enforce private endpoint routing.
Implement Generative AI and Agentic Solutions
Questions 4–11
You are building a customer-support assistant that must hold multi-turn conversations, search a knowledge base for relevant articles, execute Python to compute order refunds when needed, and persist conversation context across days for the same user.
Which Azure AI Foundry primitive should you use?
- A.A prompt flow with an LLM node and an AI Search lookup node
- B.A direct chat completions call with custom function-calling logic
- C.An Azure AI Foundry agent with file search and code interpreter tools, using threads to persist conversation context
- D.A fine-tuned model trained on past customer-support conversations
Correct: C. Foundry agents are the right primitive when you need tool calling (file search, code interpreter, custom functions), multi-turn threaded conversations with persistent state, and durable message stores. Prompt flow is the right choice for deployable, versioned, evaluated input-output pipelines — not for stateful chat. Direct chat completions force you to hand-roll state and tool orchestration. Fine-tuning addresses style/format problems, not retrieval or tool use.
Your company has 50,000 internal policy documents. You need an assistant that answers employee questions grounded in these documents. The documents update weekly and any answer must cite the source document.
Which approach is correct?
- A.Fine-tune a base model on all 50,000 documents and retrain weekly
- B.Build a RAG application with Azure AI Search over the document corpus, with integrated vectorization and citation of retrieved chunks
- C.Pass all 50,000 documents in the system prompt to ensure the model has full context
- D.Store documents in Cosmos DB and use SQL queries to retrieve them
Correct: B. RAG over Azure AI Search is the correct architecture: integrated vectorization handles embedding, hybrid search retrieves relevant chunks at query time, and citation is straightforward because retrieved chunks come with source metadata. Fine-tuning bakes facts into model weights, which become stale weekly and cannot cite sources. Passing 50,000 documents in the prompt exceeds any model context window. Cosmos DB SQL lacks vector and semantic search needed for natural-language retrieval.
You are designing an Azure AI Search index for a RAG application. The source content is PDFs in Azure Blob Storage. You need to chunk the documents, generate embeddings, and store both the chunks and embeddings — with the lowest operational overhead.
Which indexing approach is correct?
- A.Pre-process the PDFs in Azure Functions, generate embeddings via the Azure OpenAI API, and bulk upload to the index
- B.Use an Azure AI Search indexer with integrated vectorization, configured with a Blob data source, a text-split skill for chunking, and an Azure OpenAI embedding skill
- C.Manually chunk the PDFs in PowerShell and upload via REST
- D.Use Document Intelligence to extract content, then call the Azure OpenAI Assistants API to embed it
Correct: B. Integrated vectorization is the modern approach: a single indexer with a skillset (text-split + Azure OpenAI embedding) handles chunking and embedding generation declaratively, with no custom code. The indexer runs on a schedule and incrementally processes new blobs. Custom Function App pipelines work but require code, CI/CD, and monitoring you don't need. Manual chunking does not scale. The Assistants API is unrelated to indexing.
Your Foundry agent must look up customer order status and check warehouse inventory in parallel when a user asks "When will my order ship?". You want both backend calls to happen concurrently for latency reasons.
Which capability supports this?
- A.Tool choice set to "required" with two function definitions
- B.Parallel tool calls enabled, with two function definitions (get_order_status, check_inventory)
- C.A single function that internally calls both backends
- D.Two separate agents, one for each backend, called sequentially
Correct: B. Parallel tool calls let the model return multiple tool_call entries in a single completion, which your application code can execute concurrently. This is the explicit Azure OpenAI feature for the requirement. Tool_choice=required forces some tool call but does not control parallelism. A single composite function works but couples the backends; parallel tool calls keep them independent. Two agents add complexity and latency.
Your application extracts customer complaint details from emails and inserts them into a database. The model must return a JSON object that exactly matches your database schema — no extra fields, no missing fields, no free-text wrapping.
Which response format option enforces this?
- A.response_format = { "type": "text" } and instruct the model to "return only JSON"
- B.response_format = { "type": "json_object" } — the JSON mode setting
- C.response_format = { "type": "json_schema", "json_schema": { "name": "complaint", "schema": {...}, "strict": true } } — structured outputs
- D.Fine-tune the model on a labeled dataset of complaint JSONs
Correct: C. Structured outputs (response_format with json_schema and strict=true) is the correct option when you need guaranteed schema conformance. JSON mode only guarantees that output is valid JSON, not that it matches a specific schema. Plain text mode with instructions is unreliable — the model can omit fields or add commentary. Fine-tuning is expensive overkill for a problem solved by a single API parameter.
You have built a prompt flow for medical-information Q&A. Before deploying, you must verify that responses are grounded in the retrieved source documents and never invent facts. You have a ground-truth evaluation dataset of 200 question/answer pairs with cited sources.
Which evaluation metric should you prioritize?
- A.Fluency — measures grammatical quality of the answer
- B.Relevance — measures how relevant the answer is to the question
- C.Groundedness — measures whether the answer is supported by the retrieved sources
- D.Similarity — measures lexical overlap with the ground-truth answer
Correct: C. Groundedness is the built-in evaluator that scores whether each response is supported by the provided source context — exactly the hallucination-detection metric needed for medical Q&A. Fluency and relevance miss the hallucination problem (a fluent, relevant answer can still be invented). Similarity compares to the ground-truth answer text but does not check source attribution.
Your real-time conversational agent must respond within 800ms end-to-end. The use case is a customer-service chatbot answering FAQ-style questions. Cost is a major constraint — you handle 5 million requests per month.
Which Azure OpenAI model is most appropriate?
- A.GPT-4o — best general quality regardless of cost or latency
- B.GPT-4o-mini — strong quality at significantly lower latency and cost than GPT-4o
- C.o1 — chain-of-thought reasoning for complex problems
- D.GPT-3.5-Turbo — lowest cost option
Correct: B. GPT-4o-mini hits the right balance for high-volume, latency-sensitive FAQ chat: substantially lower latency and cost than GPT-4o, with quality strong enough for FAQ-style questions. o1 is the reasoning model — it deliberately thinks longer and is unsuitable for sub-second latency targets. GPT-3.5-Turbo is being deprecated in favor of GPT-4o-mini, which now offers better quality at comparable cost. GPT-4o is over-provisioned for FAQ workloads.
A user uploads a photo of a receipt. The application must extract the merchant name, date, line items, and total in structured JSON. The receipt format varies across thousands of merchants worldwide.
Which approach is most appropriate?
- A.Azure AI Vision Image Analysis 4.0 with OCR (Read)
- B.Azure AI Document Intelligence prebuilt receipt model
- C.Custom Vision object detection trained on receipt images
- D.GPT-4o with vision input and a structured-outputs schema
Correct: B. Document Intelligence prebuilt receipt model is purpose-built for this scenario: it returns merchant, date, items, and total as structured fields with confidence scores, trained on millions of global receipts. Image Analysis OCR returns raw text without structured fields. Custom Vision is for image classification or generic object detection, not field extraction. GPT-4o with vision works but is slower, more expensive, and less reliable for structured field extraction than the purpose-built prebuilt model.
Implement Computer Vision Solutions
Questions 12–14
You need to detect whether warehouse shelves contain a specific custom product packaging in real-time camera feeds. The packaging is unique to your company and not represented in any general object-detection model.
Which Azure AI service is correct?
- A.Azure AI Vision Image Analysis 4.0 with object detection
- B.Azure AI Custom Vision with an object detection project, trained on labeled images of your packaging
- C.Azure AI Document Intelligence custom extraction model
- D.GPT-4o vision with a few-shot prompt showing the packaging
Correct: B. Custom Vision is the correct service when you need to detect domain-specific objects not in any general model. Train an object detection project on labeled images of your packaging and deploy as a published iteration. Image Analysis 4.0 covers generic object detection (people, vehicles, common items) but cannot detect custom packaging without training. Document Intelligence is for documents, not warehouse imagery. GPT-4o few-shot can work but is slower, costlier per inference, and less accurate than a purpose-trained Custom Vision model.
You want to verify a person's identity by comparing a live selfie to a passport photo for a banking onboarding flow. The user must prove they are physically present and not using a photo or video replay.
Which capability do you need, and what additional step is required?
- A.Face Detect API — no additional steps
- B.Face Identification with person groups — no additional steps
- C.Face Liveness Detection — requires Microsoft Limited Access approval before use
- D.GPT-4o with vision and a chain-of-thought prompt asking it to detect a live person
Correct: C. Face Liveness Detection is the specific capability that proves physical presence by detecting subtle facial movements, eye reflections, and 3D depth cues that defeat photo and video spoofing. It is part of Microsoft's Limited Access program for Face APIs — you must apply via a Microsoft form, agree to responsible-use terms, and receive approval before the API is enabled on your subscription. Face Detect alone does not address spoofing. Face Identification matches against a known person group but not for liveness. GPT-4o cannot reliably detect liveness from a still image.
You need to extract structured data (insured party name, policy number, claim amount, incident date) from PDFs of insurance claim forms that mix typed and handwritten content. The schema is best expressed as natural-language field descriptions.
Which service is most appropriate?
- A.Azure AI Vision Image Analysis 4.0 OCR
- B.Azure AI Content Understanding with a custom analyzer defining each field in natural language
- C.Custom Vision object detection
- D.Translator document translation
Correct: B. Content Understanding is purpose-built for natural-language-described extraction across multimodal sources (documents, images, audio, video). You define an analyzer with field names and natural-language descriptions, and the service returns structured JSON with confidence scores. Image Analysis OCR returns raw text only — you would have to parse fields yourself. Custom Vision is image classification/detection, not field extraction. Translator is unrelated.
Implement Text Analysis Solutions
Questions 15–17
A support ticket system receives free-text customer messages that may contain credit card numbers, passport numbers, and email addresses. You must redact PII before storing messages in the analytics warehouse.
Which Azure AI Language capability should you use?
- A.Custom Named Entity Recognition trained on labeled PII examples
- B.Prebuilt PII detection and redaction with the appropriate domain (Conversation) and PII categories enabled
- C.Custom Text Classification to classify messages as PII-containing or not
- D.Key Phrase Extraction filtered for credit-card-shaped strings
Correct: B. Azure AI Language prebuilt PII detection identifies a wide range of PII entities (credit cards, passports, SSNs, emails, phone numbers, etc.) out of the box and can return either the spans or a pre-redacted version of the input. Choose the appropriate domain (e.g., Conversation) and enable the PII categories your scenario requires. Custom NER is overkill when prebuilt PII covers the categories. Custom Text Classification does not extract spans for redaction. Key Phrase Extraction does not target PII.
You are building a real-time, two-way voice agent where a user speaks to an AI in natural language and hears the AI's spoken response with sub-second turn latency. The agent must handle interruptions naturally.
Which API combination is correct?
- A.Azure AI Speech SDK speech-to-text + Azure OpenAI chat completions + Azure AI Speech SDK text-to-speech
- B.Azure OpenAI Realtime API with audio input and audio output
- C.Azure AI Translator with speech translation
- D.Azure AI Speech batch transcription + GPT-4o chat completions + neural TTS
Correct: B. The Azure OpenAI Realtime API is specifically designed for low-latency, two-way audio agents with native interruption handling — it streams audio in both directions over a single WebSocket connection. The classic Speech SDK + chat completions + TTS pipeline introduces serial latency at each hop and does not handle interruptions natively. Translator speech translation is for translation, not conversational agents. Batch transcription is for offline workloads, not real time.
You want to analyze product reviews not just for overall sentiment, but for sentiment on specific product aspects ("battery life: negative", "screen quality: positive"). Reviews are in English and contain multiple aspects per review.
Which capability provides this aspect-level breakdown?
- A.Sentiment Analysis with confidence scores at document and sentence level
- B.Sentiment Analysis with Opinion Mining (aspect-based sentiment) enabled
- C.Custom Text Classification trained on labeled aspect-sentiment pairs
- D.Key Phrase Extraction with sentiment post-processing
Correct: B. Opinion Mining (aspect-based sentiment analysis) is the explicit Azure AI Language feature that returns aspects (target terms) paired with assessment terms and sentiment polarity per aspect. Standard sentiment analysis returns document/sentence-level sentiment only — no aspect breakdown. Custom Text Classification would require labeling at significant effort. Key Phrase Extraction returns phrases without sentiment.
Implement Information Extraction Solutions
Questions 18–20
A RAG application searches over technical documentation. Users phrase queries in natural language (e.g., "how do I rotate certificates without downtime?"). The current pure-vector search returns relevant but sometimes poorly-ranked results.
Which Azure AI Search configuration improves ranking quality?
- A.Switch to pure keyword search to avoid embedding drift
- B.Use hybrid search (keyword + vector) with semantic ranker enabled to re-rank the top 50 results
- C.Disable vector search and rely on BM25 only
- D.Increase the embedding dimensions to 3072 by switching to text-embedding-3-large
Correct: B. Hybrid search combines keyword (BM25) and vector results, and the semantic ranker re-ranks the top 50 using a Microsoft-hosted Transformer-based re-ranker. This combination consistently outperforms either keyword or vector search alone for natural-language RAG queries. Pure keyword search loses semantic matches. Pure vector search ignores exact-term importance. Switching embedding dimensions without a re-ranker rarely fixes ranking quality.
Your company has a proprietary contract template used across thousands of vendor agreements. You need to extract 25 specific clauses and signature blocks from new contracts as they are uploaded. The general prebuilt Contract model is missing several clauses.
Which Document Intelligence approach is correct?
- A.Use the prebuilt Contract model and post-process the output
- B.Train a custom extraction model on 5–15 labeled samples of your contract template
- C.Train a custom classification model only
- D.Use Image Analysis OCR and parse with regex
Correct: B. Document Intelligence custom extraction models are purpose-built for proprietary form templates. With 5–15 labeled samples you can train a custom model that extracts your 25 specific fields with high accuracy. Custom classification routes documents to the right model but does not extract fields. Prebuilt Contract may help, but missing 25 clauses defeats the purpose. OCR + regex breaks on layout variation and is the worst-of-both approach.
You need to build a knowledge mining pipeline over scanned-PDF invoices in Azure Blob Storage. The pipeline must extract text via OCR, detect the language of each invoice, extract named entities (organizations, monetary values), and generate vector embeddings for semantic search.
Which Azure AI Search skillset design is correct?
- A.A single Azure OpenAI skill that does everything
- B.A skillset chain: OCR skill → Language Detection skill → Entity Recognition skill → Azure OpenAI Embedding skill, configured in the indexer
- C.Separate Logic Apps for each stage
- D.A single Document Intelligence custom model trained on all behaviors
Correct: B. Skillset chaining is the canonical pattern: each cognitive skill performs one task and passes its output as input to the next, declared in the indexer. OCR extracts text from images, Language Detection routes to per-language analyzers, Entity Recognition tags the structured entities, and the Azure OpenAI Embedding skill generates vectors. This runs declaratively without custom code. A single Azure OpenAI skill cannot perform OCR. Logic Apps add operational overhead. A single Document Intelligence model cannot generate embeddings.
Want 480 More AI-103 Questions?
Get full coverage of all 5 domains — Azure AI Foundry agents, RAG, vision, text analysis, and information extraction — calibrated harder than the real beta exam. Start with 40 questions free.
Start Free AI-103 Practice →