Spring Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70track

Free Amazon Web Services Data-Engineer-Associate Practice Exam with Questions & Answers | Set: 4

Questions 31

A company wants to implement real-time analytics capabilities. The company wants to use Amazon Kinesis Data Streams and Amazon Redshift to ingest and process streaming data at the rate of several gigabytes per second. The company wants to derive near real-time insights by using existing business intelligence (BI) and analytics tools.

Which solution will meet these requirements with the LEAST operational overhead?

Options:
A.

Use Kinesis Data Streams to stage data in Amazon S3. Use the COPY command to load data from Amazon S3 directly into Amazon Redshift to make the data immediately available for real-time analysis.

B.

Access the data from Kinesis Data Streams by using SQL queries. Create materialized views directly on top of the stream. Refresh the materialized views regularly to query the most recent stream data.

C.

Create an external schema in Amazon Redshift to map the data from Kinesis Data Streams to an Amazon Redshift object. Create a materialized view to read data from the stream. Set the materialized view to auto refresh.

D.

Connect Kinesis Data Streams to Amazon Kinesis Data Firehose. Use Kinesis Data Firehose to stage the data in Amazon S3. Use the COPY command to load the data from Amazon S3 to a table in Amazon Redshift.

Amazon Web Services Data-Engineer-Associate Premium Access
Questions 32

A data engineer is designing a log table for an application that requires continuous ingestion. The application must provide dependable API-based access to specific records from other applications. The application must handle more than 4,000 concurrent write operations and 6,500 read operations every second.

Options:
A.

Create an Amazon Redshift table with the KEY distribution style. Use the Amazon Redshift Data API to perform all read and write operations.

B.

Store the log files in an Amazon S3 Standard bucket. Register the schema in AWS Glue Data Catalog. Create an external Redshift table that points to the AWS Glue schema. Use the table to perform Amazon Redshift Spectrum read operations.

C.

Create an Amazon Redshift table with the EVEN distribution style. Use the Amazon Redshift JDBC connector to establish a database connection. Use the database connection to perform all read and write operations.

D.

Create an Amazon DynamoDB table that has provisioned capacity to meet the application ' s capacity needs. Use the DynamoDB table to perform all read and write operations by using DynamoDB APIs.

Questions 33

A data engineer has a one-time task to read data from objects that are in Apache Parquet format in an Amazon S3 bucket. The data engineer needs to query only one column of the data.

Which solution will meet these requirements with the LEAST operational overhead?

Options:
A.

Confiqure an AWS Lambda function to load data from the S3 bucket into a pandas dataframe- Write a SQL SELECT statement on the dataframe to query the required column.

B.

Use S3 Select to write a SQL SELECT statement to retrieve the required column from the S3 objects.

C.

Prepare an AWS Glue DataBrew project to consume the S3 objects and to query the required column.

D.

Run an AWS Glue crawler on the S3 objects. Use a SQL SELECT statement in Amazon Athena to query the required column.

Questions 34

A company uses Amazon Athena for one-time queries against data that is in Amazon S3. The company has several use cases. The company must implement permission controls to separate query processes and access to query history among users, teams, and applications that are in the same AWS account.

Which solution will meet these requirements?

Options:
A.

Create an S3 bucket for each use case. Create an S3 bucket policy that grants permissions to appropriate individual IAM users. Apply the S3 bucket policy to the S3 bucket.

B.

Create an Athena workgroup for each use case. Apply tags to the workgroup. Create an 1AM policy that uses the tags to apply appropriate permissions to the workgroup.

C.

Create an JAM role for each use case. Assign appropriate permissions to the role for each use case. Associate the role with Athena.

D.

Create an AWS Glue Data Catalog resource policy that grants permissions to appropriate individual IAM users for each use case. Apply the resource policy to the specific tables that Athena uses.

Questions 35

A data engineer maintains a materialized view that is based on an Amazon Redshift database. The view has a column named load_date that stores the date when each row was loaded.

The data engineer needs to reclaim database storage space by deleting all the rows from the materialized view.

Which command will reclaim the MOST database storage space?

Data-Engineer-Associate Question 35

Options:
A.

Option A

B.

Option B

C.

Option C

D.

Option D

Questions 36

A company is designing a serverless data processing workflow in AWS Step Functions that involves multiple steps. The processing workflow ingests data from an external API, transforms the data by using multiple AWS Lambda functions, and loads the transformed data into Amazon DynamoDB.

The company needs the workflow to perform specific steps based on the content of the incoming data.

Which Step Functions state type should the company use to meet this requirement?

Options:
A.

Parallel

B.

Choice

C.

Task

D.

Map

Questions 37

A company needs to generate a one-time performance report by joining data that is stored in Amazon DynamoDB, Amazon RDS, Amazon Redshift, and Amazon S3. The company wants to avoid unnecessary data movement and to minimize query execution time.

Which solution will meet these requirements?

Options:
A.

Capture data from DynamoDB by using DynamoDB Streams. Migrate data from Amazon RDS by using AWS DMS. Export Amazon Redshift data. Store all data in Amazon S3. Use Redshift Spectrum to run queries.

B.

Set up an AWS Glue ETL pipeline to extract, transform, and centralize data in Amazon S3. Use Amazon Athena to run analytical queries.

C.

Deploy an Amazon EMR cluster powered by Apache Spark to ingest, process, and merge datasets from multiple sources. Run analytical workloads on the merged data.

D.

Use Amazon Athena Federated Query to perform one-time joins and analysis across DynamoDB, Amazon RDS, Amazon Redshift, and Amazon S3.

Questions 38

A data engineer has two datasets that contain sales information for multiple cities and states. One dataset is named reference, and the other dataset is named primary.

The data engineer needs a solution to determine whether a specific set of values in the city and state columns of the primary dataset exactly match the same specific values in the reference dataset. The data engineer wants to use Data Quality Definition Language (DQDL) rules in an AWS Glue Data Quality job.

Which rule will meet these requirements?

Options:
A.

DatasetMatch " reference " " city- > ref_city, state- > ref_state " = 1.0

B.

ReferentialIntegrity " city,state " " reference.{ref_city,ref_state} " = 1.0

C.

DatasetMatch " reference " " city- > ref_city, state- > ref_state " = 100

D.

ReferentialIntegrity " city,state " " reference.{ref_city,ref_state} " = 100

Questions 39

A company is migrating on-premises workloads to AWS. The company wants to reduce overall operational overhead. The company also wants to explore serverless options.

The company ' s current workloads use Apache Pig, Apache Oozie, Apache Spark, Apache Hbase, and Apache Flink. The on-premises workloads process petabytes of data in seconds. The company must maintain similar or better performance after the migration to AWS.

Which extract, transform, and load (ETL) service will meet these requirements?

Options:
A.

AWS Glue

B.

Amazon EMR

C.

AWS Lambda

D.

Amazon Redshift

Questions 40

An ecommerce company processes millions of orders each day. The company uses AWS Glue ETL to collect data from multiple sources, clean the data, and store the data in an Amazon S3 bucket in CSV format by using the S3 Standard storage class. The company uses the stored data to conduct daily analysis.

The company wants to optimize costs for data storage and retrieval.

Which solution will meet this requirement?

Options:
A.

Transition the data to Amazon S3 Glacier Flexible Retrieval.

B.

Transition the data from Amazon S3 to an Amazon Aurora cluster.

C.

Configure AWS Glue ETL to transform the incoming data to Apache Parquet format.

D.

Configure AWS Glue ETL to use Amazon EMR to process incoming data in parallel.