Spring Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70track

Free Amazon Web Services Data-Engineer-Associate Practice Exam with Questions & Answers | Set: 7

Questions 61

A company uses Amazon Redshift as its data warehouse service. A data engineer needs to design a physical data model.

The data engineer encounters a de-normalized table that is growing in size. The table does not have a suitable column to use as the distribution key.

Which distribution style should the data engineer use to meet these requirements with the LEAST maintenance overhead?

Options:
A.

ALL distribution

B.

EVEN distribution

C.

AUTO distribution

D.

KEY distribution

Questions 62

A telecommunications company collects network usage data throughout each day at a rate of several thousand data points each second. The company runs an application to process the usage data in real time. The company aggregates and stores the data in an Amazon Aurora DB instance.

Sudden drops in network usage usually indicate a network outage. The company must be able to identify sudden drops in network usage so the company can take immediate remedial actions.

Which solution will meet this requirement with the LEAST latency?

Options:
A.

Create an AWS Lambda function to query Aurora for drops in network usage. Use Amazon EventBridge to automatically invoke the Lambda function every minute.

B.

Modify the processing application to publish the data to an Amazon Kinesis data stream. Create an Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) application to detect drops in network usage.

C.

Replace the Aurora database with an Amazon DynamoDB table. Create an AWS Lambda function to query the DynamoDB table for drops in network usage every minute. Use DynamoDB Accelerator (DAX) between the processing application and DynamoDB table.

D.

Create an AWS Lambda function within the Database Activity Streams feature of Aurora to detect drops in network usage.

Questions 63

A company has an Amazon S3–based data lake. The data lake contains datasets that belong to multiple departments. The data lake ingests millions of customer records each day.

A data engineer needs to design an access and storage solution that allows departments to access only the subset of the company’s dataset that each department requires. The solution must follow the principle of least privilege.

Which solution will meet these requirements with the LEAST operational effort?

Options:
A.

Define IAM policies and IAM roles for each department. Specify the S3 access paths from the data lake that each team can access.

B.

Set up Amazon Redshift and Amazon Redshift Spectrum as the primary entry points for the data lake. Define an IAM role that Amazon Redshift can assume. Configure the IAM role to grant access to the data that is in Amazon S3.

C.

Set up AWS Lake Formation. Assign LF-Tags to AWS Glue Data Catalog resources. Enable Lake Formation tag-based access control (LF-TBAC).

D.

Deploy an Amazon RDS for PostgreSQL database that has the aws_s3 extension installed. Configure AWS Step Functions events to invoke an AWS Lambda function to sync the data lake with the database.

Questions 64

A company stores its processed data in an S3 bucket. The company has a strict data access policy. The company uses IAM roles to grant teams within the company different levels of access to the S3 bucket.

The company wants to receive notifications when a user violates the data access policy. Each notification must include the username of the user who violated the policy.

Which solution will meet these requirements?

Options:
A.

Use AWS Config rules to detect violations of the data access policy. Set up compliance alarms.

B.

Use Amazon CloudWatch metrics to gather object-level metrics. Set up CloudWatch alarms.

C.

Use AWS CloudTrail to track object-level events for the S3 bucket. Forward events to Amazon CloudWatch to set up CloudWatch alarms.

D.

Use Amazon S3 server access logs to monitor access to the bucket. Forward the access logs to an Amazon CloudWatch log group. Use metric filters on the log group to set up CloudWatch alarms.

Questions 65

A company wants to migrate an application and an on-premises Apache Kafka server to AWS. The application processes incremental updates that an on-premises Oracle database sends to the Kafka server. The company wants to use the replatform migration strategy instead of the refactor strategy.

Which solution will meet these requirements with the LEAST management overhead?

Options:
A.

Amazon Kinesis Data Streams

B.

Amazon Managed Streaming for Apache Kafka (Amazon MSK) provisioned cluster

C.

Amazon Data Firehose

D.

Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless

Questions 66

A data engineer is building a data pipeline on AWS by using AWS Glue extract, transform, and load (ETL) jobs. The data engineer needs to process data from Amazon RDS and MongoDB, perform transformations, and load the transformed data into Amazon Redshift for analytics. The data updates must occur every hour.

Which combination of tasks will meet these requirements with the LEAST operational overhead? (Choose two.)

Options:
A.

Configure AWS Glue triggers to run the ETL jobs even/ hour.

B.

Use AWS Glue DataBrewto clean and prepare the data for analytics.

C.

Use AWS Lambda functions to schedule and run the ETL jobs even/ hour.

D.

Use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift.

E.

Use the Redshift Data API to load transformed data into Amazon Redshift.

Questions 67

A company is planning to use a provisioned Amazon EMR cluster that runs Apache Spark jobs to perform big data analysis. The company requires high reliability. A big data team must follow best practices for running cost-optimized and long-running workloads on Amazon EMR. The team must find a solution that will maintain the company's current level of performance.

Which combination of resources will meet these requirements MOST cost-effectively? (Choose two.)

Options:
A.

Use Hadoop Distributed File System (HDFS) as a persistent data store.

B.

Use Amazon S3 as a persistent data store.

C.

Use x86-based instances for core nodes and task nodes.

D.

Use Graviton instances for core nodes and task nodes.

E.

Use Spot Instances for all primary nodes.

Questions 68

A data engineer configured an AWS Glue Data Catalog for data that is stored in Amazon S3 buckets. The data engineer needs to configure the Data Catalog to receive incremental updates.

The data engineer sets up event notifications for the S3 bucket and creates an Amazon Simple Queue Service (Amazon SQS) queue to receive the S3 events.

Which combination of steps should the data engineer take to meet these requirements with LEAST operational overhead? (Select TWO.)

Options:
A.

Create an S3 event-based AWS Glue crawler to consume events from the SQS queue.

B.

Define a time-based schedule to run the AWS Glue crawler, and perform incremental updates to the Data Catalog.

C.

Use an AWS Lambda function to directly update the Data Catalog based on S3 events that the SQS queue receives.

D.

Manually initiate the AWS Glue crawler to perform updates to the Data Catalog when there is a change in the S3 bucket.

E.

Use AWS Step Functions to orchestrate the process of updating the Data Catalog based on 53 events that the SQS queue receives.

Questions 69

An ecommerce company processes millions of orders each day. The company uses AWS Glue ETL to collect data from multiple sources, clean the data, and store the data in an Amazon S3 bucket in CSV format by using the S3 Standard storage class. The company uses the stored data to conduct daily analysis.

The company wants to optimize costs for data storage and retrieval.

Which solution will meet this requirement?

Options:
A.

Transition the data to Amazon S3 Glacier Flexible Retrieval.

B.

Transition the data from Amazon S3 to an Amazon Aurora cluster.

C.

Configure AWS Glue ETL to transform the incoming data to Apache Parquet format.

D.

Configure AWS Glue ETL to use Amazon EMR to process incoming data in parallel.

Questions 70

A company has a data warehouse in Amazon Redshift. To comply with security regulations, the company needs to log and store all user activities and connection activities for the data warehouse.

Which solution will meet these requirements?

Options:
A.

Create an Amazon S3 bucket. Enable logging for the Amazon Redshift cluster. Specify the S3 bucket in the logging configuration to store the logs.

B.

Create an Amazon Elastic File System (Amazon EFS) file system. Enable logging for the Amazon Redshift cluster. Write logs to the EFS file system.

C.

Create an Amazon Aurora MySQL database. Enable logging for the Amazon Redshift cluster. Write the logs to a table in the Aurora MySQL database.

D.

Create an Amazon Elastic Block Store (Amazon EBS) volume. Enable logging for the Amazon Redshift cluster. Write the logs to the EBS volume.