Spring Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70track

Free Amazon Web Services Data-Engineer-Associate Practice Exam with Questions & Answers | Set: 7

Questions 61

A data engineer is using an Apache Iceberg framework to build a data lake that contains 100 TB of data. The data engineer wants to run AWS Glue Apache Spark Jobs that use the Iceberg framework.

What combination of steps will meet these requirements? (Select TWO.)

Options:
A.

Create a key named -conf for an AWS Glue job. Set Iceberg as a value for the --datalake-formats job parameter.

B.

Specify the path to a specific version of Iceberg by using the --extra-Jars job parameter. Set Iceberg as a value for the ~ datalake-formats job parameter.

C.

Set Iceberg as a value for the -datalake-formats job parameter.

D.

Set the -enable-auto-scaling parameter to true.

E.

Add the -job-bookmark-option: job-bookmark-enable parameter to an AWS Glue job.

Questions 62

The company stores a large volume of customer records in Amazon S3. To comply with regulations, the company must be able to access new customer records immediately for the first 30 days after the records are created. The company accesses records that are older than 30 days infrequently.

The company needs to cost-optimize its Amazon S3 storage.

Which solution will meet these requirements MOST cost-effectively?

Options:
A.

Apply a lifecycle policy to transition records to S3 Standard Infrequent-Access (S3 Standard-IA) storage after 30 days.

B.

Use S3 Intelligent-Tiering storage.

C.

Transition records to S3 Glacier Deep Archive storage after 30 days.

D.

Use S3 Standard-Infrequent Access (S3 Standard-IA) storage for all customer records.

Questions 63

A company has an Amazon Redshift data warehouse that users access by using a variety of IAM roles. More than 100 users access the data warehouse every day.

The company wants to control user access to the objects based on each user ' s job role, permissions, and how sensitive the data is.

Which solution will meet these requirements?

Options:
A.

Use the role-based access control (RBAC) feature of Amazon Redshift.

B.

Use the row-level security (RLS) feature of Amazon Redshift.

C.

Use the column-level security (CLS) feature of Amazon Redshift.

D.

Use dynamic data masking policies in Amazon Redshift.

Questions 64

A company is using Amazon Redshift to build a data warehouse solution. The company is loading hundreds of tiles into a tact table that is in a Redshift cluster.

The company wants the data warehouse solution to achieve the greatest possible throughput. The solution must use cluster resources optimally when the company loads data into the tact table.

Which solution will meet these requirements?

Options:
A.

Use multiple COPY commands to load the data into the Redshift cluster.

B.

Use S3DistCp to load multiple files into Hadoop Distributed File System (HDFS). Use an HDFS connector to ingest the data into the Redshift cluster.

C.

Use a number of INSERT statements equal to the number of Redshift cluster nodes. Load the data in parallel into each node.

D.

Use a single COPY command to load the data into the Redshift cluster.

Questions 65

A company builds a new data pipeline to process data for business intelligence reports. Users have noticed that data is missing from the reports.

A data engineer needs to add a data quality check for columns that contain null values and for referential integrity at a stage before the data is added to storage.

Which solution will meet these requirements with the LEAST operational overhead?

Options:
A.

Use Amazon SageMaker Data Wrangler to create a Data Quality and Insights report.

B.

Use AWS Glue ETL jobs to perform a data quality evaluation transform on the data. Use an IsComplete rule on the requested columns. Use a ReferentialIntegrity rule for each join.

C.

Use AWS Glue ETL jobs to perform a SQL transform on the data to determine whether requested columns contain null values. Use a second SQL transform to check referential integrity.

D.

Use Amazon SageMaker Data Wrangler and a custom Python transform to create custom rules to check for null values and referential integrity.

Questions 66

A marketing company uses Amazon S3 to store marketing data. The company uses versioning in some buckets. The company runs several jobs to read and load data into the buckets.

To help cost-optimize its storage, the company wants to gather information about incomplete multipart uploads and outdated versions that are present in the S3 buckets.

Which solution will meet these requirements with the LEAST operational effort?

Options:
A.

Use AWS CLI to gather the information.

B.

Use Amazon S3 Inventory configurations reports to gather the information.

C.

Use the Amazon S3 Storage Lens dashboard to gather the information.

D.

Use AWS usage reports for Amazon S3 to gather the information.

Questions 67

A financial company recently added more features to its mobile app. The new features required the company to create a new topic in an existing Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster.

A few days after the company added the new topic, Amazon CloudWatch raised an alarm on the RootDiskUsed metric for the MSK cluster.

How should the company address the CloudWatch alarm?

Options:
A.

Expand the storage of the MSK broker. Configure the MSK cluster storage to expand automatically.

B.

Expand the storage of the Apache ZooKeeper nodes.

C.

Update the MSK broker instance to a larger instance type. Restart the MSK cluster.

D.

Specify the Target-Volume-in-GiB parameter for the existing topic.

Questions 68

A company’s data processing pipeline uses AWS Glue jobs and AWS Glue Data Catalog. All AWS Glue jobs must run in a custom VPC inside a private subnet. The company uses a NAT gateway to support outbound connections.

A data engineer needs to use AWS Glue to migrate data from an on-premises PostgreSQL database to Amazon S3. There is no current network connection between AWS and the on-premises environment. However, the data engineer has updated the on-premises database to allow traffic from the custom VPC.

Which solution will meet these requirements?

Options:
A.

Create a JDBC connection in AWS Glue with the database JDBC URL, username, and password.

B.

Create a Simple Authentication and Security Layer (SASL) connection in AWS Glue to the on-premises database.

C.

Create a JDBC connection in AWS Glue with a security group that allows TCP traffic to and from itself.

D.

Create a JDBC connection in AWS Glue that uses a JDBC driver stored in Amazon S3. Retrieve the database URL, username, and password from AWS Secrets Manager.

Questions 69

A company is planning to use a provisioned Amazon EMR cluster that runs Apache Spark jobs to perform big data analysis. The company requires high reliability. A big data team must follow best practices for running cost-optimized and long-running workloads on Amazon EMR. The team must find a solution that will maintain the company ' s current level of performance.

Which combination of resources will meet these requirements MOST cost-effectively? (Choose two.)

Options:
A.

Use Hadoop Distributed File System (HDFS) as a persistent data store.

B.

Use Amazon S3 as a persistent data store.

C.

Use x86-based instances for core nodes and task nodes.

D.

Use Graviton instances for core nodes and task nodes.

E.

Use Spot Instances for all primary nodes.

Questions 70

A data engineer needs to use AWS Step Functions to design an orchestration workflow. The workflow must parallel process a large collection of data files and apply a specific transformation to each file.

Which Step Functions state should the data engineer use to meet these requirements?

Options:
A.

Parallel state

B.

Choice state

C.

Map state

D.

Wait state