Spring Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70track

Free Amazon Web Services Data-Engineer-Associate Practice Exam with Questions & Answers | Set: 3

Questions 21

A company stores time-series data that is collected from streaming services in an Amazon S3 bucket. The company must ensure that only workloads that are deployed within the company's VPC can access the data.

Which solution will meet this requirement?

Options:
A.

Create an S3 bucket policy that uses a condition to allow access only to traffic that originates from the company's VPC.

B.

Apply a security group to the S3 bucket that allows connections only from the company's VPC CIDR block.

C.

Define an IAM policy that denies access to all users unless the request originates from within the company's VPC.

D.

Use a network ACL on the VPC subnets to allow only specific resources to access the S3 bucket.

Amazon Web Services Data-Engineer-Associate Premium Access
Questions 22

A company is using an AWS Transfer Family server to migrate data from an on-premises environment to AWS. Company policy mandates the use of TLS 1.2 or above to encrypt the data in transit.

Which solution will meet these requirements?

Options:
A.

Generate new SSH keys for the Transfer Family server. Make the old keys and the new keys available for use.

B.

Update the security group rules for the on-premises network to allow only connections that use TLS 1.2 or above.

C.

Update the security policy of the Transfer Family server to specify a minimum protocol version of TLS 1.2.

D.

Install an SSL certificate on the Transfer Family server to encrypt data transfers by using TLS 1.2.

Questions 23

A company needs to build an extract, transform, and load (ETL) pipeline that has separate stages for batch data ingestion, transformation, and storage. The pipeline must store the transformed data in an Amazon S3 bucket. Each stage must automatically retry failures. The pipeline must provide visibility into the success or failure of individual stages.

Which solution will meet these requirements with the LEAST operational overhead?

Options:
A.

Chain AWS Glue jobs that perform each stage together by using job triggers. Set the MaxRetries field to 0.

B.

Deploy AWS Step Functions workflows to orchestrate AWS Lambda functions that ingest data. Use AWS Glue jobs to transform the data and store the data in the S3 bucket.

C.

Build an Amazon EventBridge–based pipeline that invokes AWS Lambda functions to perform each stage.

D.

Schedule Apache Airflow directed acyclic graphs (DAGs) on Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate pipeline steps. Use Amazon Simple Queue Service (Amazon SQS) to ingest data. Use AWS Glue jobs to transform data and store the data in the S3 bucket.

Questions 24

A company wants to implement real-time analytics capabilities. The company wants to use Amazon Kinesis Data Streams and Amazon Redshift to ingest and process streaming data at the rate of several gigabytes per second. The company wants to derive near real-time insights by using existing business intelligence (BI) and analytics tools.

Which solution will meet these requirements with the LEAST operational overhead?

Options:
A.

Use Kinesis Data Streams to stage data in Amazon S3. Use the COPY command to load data from Amazon S3 directly into Amazon Redshift to make the data immediately available for real-time analysis.

B.

Access the data from Kinesis Data Streams by using SQL queries. Create materialized views directly on top of the stream. Refresh the materialized views regularly to query the most recent stream data.

C.

Create an external schema in Amazon Redshift to map the data from Kinesis Data Streams to an Amazon Redshift object. Create a materialized view to read data from the stream. Set the materialized view to auto refresh.

D.

Connect Kinesis Data Streams to Amazon Kinesis Data Firehose. Use Kinesis Data Firehose to stage the data in Amazon S3. Use the COPY command to load the data from Amazon S3 to a table in Amazon Redshift.

Questions 25

A data engineer is building a solution to detect sensitive information that is stored in a data lake across multiple Amazon S3 buckets. The solution must detect personally identifiable information (PII) that is in a proprietary data format.

Which solution will meet these requirements with the LEAST operational overhead?

Options:
A.

Use the AWS Glue Detect PII transform with specific patterns.

B.

Use Amazon Macie with managed data identifiers.

C.

Use an AWS Lambda function with custom regular expressions.

D.

Use Amazon Athena with a SQL query to match the custom formats.

Questions 26

A data engineer is using AWS Glue to build an extract, transform, and load (ETL) pipeline that processes streaming data from sensors. The pipeline sends the data to an Amazon S3 bucket in near real-time. The data engineer also needs to perform transformations and join the incoming data with metadata that is stored in an Amazon RDS for PostgreSQL database. The data engineer must write the results back to a second S3 bucket in Apache Parquet format.

Which solution will meet these requirements?

Options:
A.

Use an AWS Glue streaming job and AWS Glue Studio to perform the transformations and to write the data in Parquet format.

B.

Use AWS Glue jobs and AWS Glue Data Catalog to catalog the data from Amazon S3 and Amazon RDS. Configure the jobs to perform the transformations and joins and to write the output in Parquet format.

C.

Use an AWS Glue interactive session to process the streaming data and to join the data with the RDS database.

D.

Use an AWS Glue Python shell job to run a Python script that processes the data in batches. Keep track of processed files by using AWS Glue bookmarks.

Questions 27

A company has used an Amazon Redshift table that is named Orders for 6 months. The company performs weekly updates and deletes on the table. The table has an interleaved sort key on a column that contains AWS Regions.

The company wants to reclaim disk space so that the company will not run out of storage space. The company also wants to analyze the sort key column.

Which Amazon Redshift command will meet these requirements?

Options:
A.

VACUUM FULL Orders

B.

VACUUM DELETE ONLY Orders

C.

VACUUM REINDEX Orders

D.

VACUUM SORT ONLY Orders

Questions 28

A gaming company uses AWS Glue to perform read and write operations on Apache Iceberg tables for real-time streaming data. The data in the Iceberg tables is stored in Apache Parquet format. The company is experiencing slow query performance.

Which solutions will improve query performance? (Select TWO)

Options:
A.

Use AWS Glue Data Catalog to generate column-level statistics for the Iceberg tables on a schedule.

B.

Use AWS Glue Data Catalog to automatically compact the Iceberg tables.

C.

Use AWS Glue Data Catalog to automatically optimize indexes for the Iceberg tables.

D.

Use AWS Glue Data Catalog to enable copy-on-write for the Iceberg tables.

E.

Use AWS Glue Data Catalog to generate views for the Iceberg tables.

Questions 29

A marketing company uses Amazon S3 to store marketing data. The company uses versioning in some buckets. The company runs several jobs to read and load data into the buckets.

To help cost-optimize its storage, the company wants to gather information about incomplete multipart uploads and outdated versions that are present in the S3 buckets.

Which solution will meet these requirements with the LEAST operational effort?

Options:
A.

Use AWS CLI to gather the information.

B.

Use Amazon S3 Inventory configurations reports to gather the information.

C.

Use the Amazon S3 Storage Lens dashboard to gather the information.

D.

Use AWS usage reports for Amazon S3 to gather the information.

Questions 30

A data engineer has two datasets that contain sales information for multiple cities and states. One dataset is named reference, and the other dataset is named primary.

The data engineer needs a solution to determine whether a specific set of values in the city and state columns of the primary dataset exactly match the same specific values in the reference dataset. The data engineer wants to use Data Quality Definition Language (DQDL) rules in an AWS Glue Data Quality job.

Which rule will meet these requirements?

Options:
A.

DatasetMatch "reference" "city->ref_city, state->ref_state" = 1.0

B.

ReferentialIntegrity "city,state" "reference.{ref_city,ref_state}" = 1.0

C.

DatasetMatch "reference" "city->ref_city, state->ref_state" = 100

D.

ReferentialIntegrity "city,state" "reference.{ref_city,ref_state}" = 100