How I passed the AWS Certified Data Analytics —Specialty Exam (DAS-C01)
Adit Modi
Posted on November 4, 2022
Introduction
I recently passed the AWS Certified Data Analytics — Specialty Exam. I'd like to share my thoughts on what I did to pass and some notes I kept along the way. Before beginning, it's worth considering what this exam is about.
AWS Certified Data Analytics – Specialty is intended for individuals with experience and expertise working with AWS services to design, build, secure, and maintain analytics solutions.
Earning AWS Certified Data Analytics – Specialty validates expertise in using AWS data lakes and analytics services to get insights from data. This certification helps organizations identify and develop talent with critical skills for implementing cloud initiatives.
This article gives you an overview which content and training material I used to prepare myself for the AWS DAS-C01 exam.
Exam Prerequisites
Before you take this exam, AWS recommends you have:
- Five years of experience with common data analytics technologies
- Two years of hands-on experience and expertise working with AWS services to design, build, secure, and maintain analytics solutions
- Ability to define AWS data analytics services and understand how they integrate with each other
- Ability to explain how AWS data analytics services fit in the data lifecycle of collection, storage, processing, and visualization
Exam overview
Level: Specialty
Length: 180 minutes to complete the exam
Cost: 300 USD
VisitExam pricing for additional cost information.
Format: 65 questions, either multiple choice or multiple response
Delivery method: Pearson VUE and PSI; testing center or online proctored exam
Exam Outline
This exam guide includes weightings, test domains, and objectives for the exam. It is not a comprehensive listing of the content on the exam. However, additional context for each of the objectives is available to help guide your preparation for the exam.
The following table lists the main content domains and their
weightings. The table precedes the complete exam content outline, which includes the additional context. The percentage in each domain represents only scored content.
Domain | % of Exam |
---|---|
Domain 1: Collection | 18% |
Domain 2: Storage and Data Management | 22% |
Domain 3: Processing | 24% |
Domain 4: Analysis and Visualization | 18% |
Domain 5: | Security 18% |
TOTAL | 100% |
Domain 1: Collection
1.1 Determine the operational characteristics of the collection system
Evaluate that the data loss is within tolerance limits in the event of failures
Evaluate costs associated with data acquisition, transfer, and provisioning from various sources into the collection system (e.g., networking, bandwidth, ETL/data migration costs)
Assess the failure scenarios that the collection system may undergo, and take remediation actions based on impact
Determine data persistence at various points of data capture
Identify the latency characteristics of the collection system
1.2 Select a collection system that handles the frequency, volume, and the source of data
Describe and characterize the volume and flow characteristics of incoming data (streaming, transactional, batch)
Match flow characteristics of data to potential solutions
Assess the tradeoffs between various ingestion services taking into account scalability, cost, fault tolerance, latency, etc.
Explain the throughput capability of a variety of different types of data collection and identify bottlenecks
Choose a collection solution that satisfies connectivity constraints of the source data system
1.3 Select a collection system that addresses the key properties of data, such as order, format, and compression
Describe how to capture data changes at the source
Discuss data structure and format, compression applied, and encryption requirements
Distinguish the impact of out-of-order delivery of data, duplicate delivery of data, and the tradeoffs between at-most-once, exactly-once, and at-least-once processing
Describe how to transform and filter data during the collection process
Domain 2: Storage and Data Management
2.1 Determine the operational characteristics of the storage solution for analytics
Determine the appropriate storage service(s) on the basis of cost vs. performance
Understand the durability, reliability, and latency characteristics of the storage solution based on requirements
Determine the requirements of a system for strong vs. eventual consistency of the storage system
Determine the appropriate storage solution to address data freshness requirements
2.2 Determine data access and retrieval patterns
Determine the appropriate storage solution based on update patterns (e.g., bulk, transactional, micro batching)
Determine the appropriate storage solution based on access patterns (e.g., sequential vs. random access, continuous usage vs.ad hoc)
Determine the appropriate storage solution to address change characteristics of data (appendonly changes vs. updates)
Determine the appropriate storage solution for long-term storage vs. transient storage
Determine the appropriate storage solution for structured vs. semi-structured data
Determine the appropriate storage solution to address query latency requirements
2.3 Select appropriate data layout, schema, structure, and format
Determine appropriate mechanisms to address schema evolution requirements
Select the storage format for the task
Select the compression/encoding strategies for the chosen storage format
Select the data sorting and distribution strategies and the storage layout for efficient data access
Explain the cost and performance implications of different data distributions, layouts, and formats (e.g., size and number of files)
Implement data formatting and partitioning schemes for data-optimized analysis
2.4 Define data lifecycle based on usage patterns and business requirements
Determine the strategy to address data lifecycle requirements
Apply the lifecycle and data retention policies to different storage solutions
2.5 Determine the appropriate system for cataloging data and managing metadata
Evaluate mechanisms for discovery of new and updated data sources
Evaluate mechanisms for creating and updating data catalogs and metadata
Explain mechanisms for searching and retrieving data catalogs and metadata
Explain mechanisms for tagging and classifying data
Domain 3: Processing
3.1 Determine appropriate data processing solution requirements
Understand data preparation and usage requirements
Understand different types of data sources and targets
Evaluate performance and orchestration needs
Evaluate appropriate services for cost, scalability, and availability
3.2 Design a solution for transforming and preparing data for analysis
Apply appropriate ETL/ELT techniques for batch and real-time workloads
Implement failover, scaling, and replication mechanisms
Implement techniques to address concurrency needs
Implement techniques to improve cost-optimization efficiencies
Apply orchestration workflows
Aggregate and enrich data for downstream consumption
3.3 Automate and operationalize data processing solutions
Implement automated techniques for repeatable workflows
Apply methods to identify and recover from processing failures
Deploy logging and monitoring solutions to enable auditing and traceability
Domain 4: Analysis and Visualization
4.1 Determine the operational characteristics of the analysis and visualization solution
Determine costs associated with analysis and visualization
Determine scalability associated with analysis
Determine failover recovery and fault tolerance within the RPO/RTO
Determine the availability characteristics of an analysis tool
Evaluate dynamic, interactive, and static presentations of data
Translate performance requirements to an appropriate visualization approach (pre-compute and consume static data vs. consume dynamic data)
4.2 Select the appropriate data analysis solution for a given scenario
Evaluate and compare analysis solutions
Select the right type of analysis based on the customer use case (streaming, interactive, collaborative, operational)
4.3 Select the appropriate data visualization solution for a given scenario
Evaluate output capabilities for a given analysis solution (metrics, KPIs, tabular, API)
Choose the appropriate method for data delivery (e.g., web, mobile, email, collaborative notebooks)
Choose and define the appropriate data refresh schedule
Choose appropriate tools for different data freshness requirements (e.g., Amazon Elasticsearch Service vs. Amazon QuickSight vs. Amazon EMR notebooks)
Understand the capabilities of visualization tools for interactive use cases (e.g., drill down, drill
through and pivot)
Implement the appropriate data access mechanism (e.g., in memory vs. direct access)
Implement an integrated solution from multiple heterogeneous data sources
Domain 5: Security
5.1 Select appropriate authentication and authorization mechanisms
Implement appropriate authentication methods (e.g., federated access, SSO, IAM)
Implement appropriate authorization methods (e.g., policies, ACL, table/column level permissions)
Implement appropriate access control mechanisms (e.g., security groups, role-based control)
5.2 Apply data protection and encryption techniques
Determine data encryption and masking needs
Apply different encryption approaches (server-side encryption, client-side encryption, AWS KMS, AWS CloudHSM)
Implement at-rest and in-transit encryption mechanisms
Implement data obfuscation and masking techniques
Apply basic principles of key rotation and secrets management
5.3 Apply data governance and compliance controls
Determine data governance and compliance requirements
Understand and configure access and audit logging across data analytics services
Implement appropriate controls to meet compliance requirements
👉 More Information on the exam guide can be found here.
How did I prepare ?
I spend a considerable amount of time learning AWS before attempting to take the Certification Exam. It is important that you spend time with AWS for you to be good at it. The Resources I used while preparing I am linking them below one by one.
1) 📚 Courses I took: Initially, I enrolled in a course on Udemy called “AWS Certified Data Analytics Specialty - Hands On!” by Stephane Maarek & Frank Kane which is a very good course and covers all the most important aspects of the AWS and its fundamentals so this is a very good start.
👉 More Information on the udemy course can be found here
2)🛠️ hands-on projects: Learning only theory won't help , you must work on some hands-on AWS projects. I would recommend you to practice some of the AWS projects from here or you can practice them from skills builder learning Center.
👉some of the projects I practiced are mentioned in my github repository.
3)📋 AWS Ramp-Up Guides: Your guides to learning the AWS Cloud.
- AWS Ramp-Up Guides offer a variety of resources to help you build your skills and knowledge of the AWS Cloud. Each guide features carefully selected digital training, classroom courses, videos, whitepapers, certifications, and more. Explore the guides below by role, solution, or industry area.
👉 more details can be found here
4) 🤝 being part of a Study Groups: I also recommend you to be part of a study groups. it helps you stay focused, probably having study groups with people studying for the same exam is an added benefit.
Study Groups I was part of:
- Cloud and DevOps Babies:
Cloud and DevOps Babies are a global group of babies with curious minds to learn/decode Cloud, DevOps and Microservices tech stacks.
👉 more details can be found here
- Tech Study Slack: TechStudySlack is a Slack for people studying Tech
👉 more details can be found here
5)✍️ practice tests: Lastly I recommend all of you to pass these practice exams before attending the real exam. It provides simulated questions that are very similar to the actual exam.
One of the selling points of this practice exam is that each question contains detailed explanations that will help you gain a deeper understanding of AWS services. It not just explains what the correct answer is, but also explains why other answers are wrong. It is extremely helpful to make you recognize the difference between similar services.
- Tutorials Dojo Practice Exams:
👉 more details can be found here
6)📝 Notes: I outlined the resources I would use and a rough guideline of how I would approach studying. You should find something that works for you but have structure and commit to it.
Useful Study tips and tricks
As usual, more study tips and tricks to help you with the exam:
- This exam is available through online proctoring so that you don’t have to travel to the nearby testing center to take this exam.
- Additional handicap of 30 minutes for non-native English speakers is still available, so make sure that you have requested that.
- Make sure you will use the flagging mechanism and reiterate the questions if you have time.
- There is no penalty for guessing.
- You can and should apply the same rules as in the other exams look here for more details regarding how to read a question and answers.
Additional Resources
Stephane Mareek and Frank Kane’s AWS Certified Data Analytics Specialty - Hands On!
ACloud Guru's AWS Certified Data Analytics - Specialty Course
I hope this will help you to prepare and evaluate your knowledge.
Let me know your thoughts in the comment section 👇
And if you haven't yet, make sure to follow me on below handles:
👋 connect with me on LinkedIn
🤓 connect with me on Twitter
🐱💻 follow me on github
✍️ Do Checkout my blogs
Like, share and follow me 🚀 for more content.
Good Luck with your exam! Have Fun. 💪
Posted on November 4, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.