Spring Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70track

Free Amazon Web Services Data-Engineer-Associate Practice Exam with Questions & Answers | Set: 9

Questions 81

A company wants to build a dimension table in an Amazon S3 bucket. The bucket contains historical data that includes 10 million records. The historical data is 1 TB in size.

A data engineer needs a solution to update changes for up to 10,000 records in the base table every day.

Which solution will meet this requirement with the LOWEST runtime?

Options:
A.

Develop an Apache Spark job in Amazon EMR to read the historical data and the new changes into two Spark DataFrames. Use the Spark update method to update the base table.

B.

Develop an AWS Glue Python job to read the historical data and new changes into two Pandas DataFrames. Use the Pandas update method to update the base table.

C.

Develop an AWS Glue Apache Spark job to read the historical data and new changes into two Spark DataFrames. Use the Spark update method to update the base table.

D.

Develop an Amazon EMR job to read new changes into Apache Spark DataFrames. Use the Apache Hudi framework to create the base table in Amazon S3. Use the Spark update method to update the base table.

Amazon Web Services Data-Engineer-Associate Premium Access
Questions 82

A data engineer needs to analyze time-sensitive sales data. The company stores the data in an Amazon S3 bucket. The data engineer uses AWS Glue Data Catalog to access the data.

When performing the analysis, the data engineer notices that some records are missing or out of date.

What is the likely cause of these issues?

Options:
A.

AWS Glue Data Catalog is not up to date with the latest S3 partition changes.

B.

Incorrect IAM roles are assigned to the AWS Glue jobs.

C.

Versioning is not enabled on the S3 bucket.

D.

The AWS Glue job schedules overlap with one another.

Questions 83

A company stores its processed data in an S3 bucket. The company has a strict data access policy. The company uses IAM roles to grant teams within the company different levels of access to the S3 bucket.

The company wants to receive notifications when a user violates the data access policy. Each notification must include the username of the user who violated the policy.

Which solution will meet these requirements?

Options:
A.

Use AWS Config rules to detect violations of the data access policy. Set up compliance alarms.

B.

Use Amazon CloudWatch metrics to gather object-level metrics. Set up CloudWatch alarms.

C.

Use AWS CloudTrail to track object-level events for the S3 bucket. Forward events to Amazon CloudWatch to set up CloudWatch alarms.

D.

Use Amazon S3 server access logs to monitor access to the bucket. Forward the access logs to an Amazon CloudWatch log group. Use metric filters on the log group to set up CloudWatch alarms.

Questions 84

A company creates a new non-production application that runs on an Amazon EC2 instance. The application needs to communicate with an Amazon RDS database instance using Java Database Connectivity (JDBC). The EC2 instances and the RDS database instance are in the same subnet.

Which solution will meet this requirement?

Options:
A.

Modify the IAM role that is assigned to the database instance to allow connections from the EC2 instances.

B.

Modify the ec2_authorized_hosts parameter in the RDS parameter group to include the EC2 instances. Restart the database instance.

C.

Update the database security group to allow connections from the EC2 instances.

D.

Enable the Amazon RDS Data API and specify the Amazon Resource Name (ARN) of the database instance in the JDBC connection string.

Questions 85

A company stores customer data that contains personally identifiable information (PII) in an Amazon Redshift cluster. The company ' s marketing, claims, and analytics teams need to be able to access the customer data.

The marketing team should have access to obfuscated claim information but should have full access to customer contact information.

The claims team should have access to customer information for each claim that the team processes.

The analytics team should have access only to obfuscated PII data.

Which solution will enforce these data access requirements with the LEAST administrative overhead?

Options:
A.

Create a separate Redshift cluster for each team. Load only the required data for each team. Restrict access to clusters based on the teams.

B.

Create views that include required fields for each of the data requirements. Grant the teams access only to the view that each team requires.

C.

Create a separate Amazon Redshift database role for each team. Define masking policies that apply for each team separately. Attach appropriate masking policies to each team role.

D.

Move the customer data to an Amazon S3 bucket. Use AWS Lake Formation to create a data lake. Use fine-grained security capabilities to grant each team appropriate permissions to access the data.

Questions 86

A data engineer needs to deploy a complex pipeline. The stages of the pipeline must run scripts, but only fully managed and serverless services can be used.

Options:
A.

Deploy AWS Glue jobs and workflows. Use AWS Glue to run the jobs and workflows on a schedule.

B.

Use Amazon MWAA to build and schedule the pipeline.

C.

Deploy the script to EC2. Use EventBridge to schedule it.

D.

Use AWS Glue DataBrew and EventBridge to run on a schedule.