Black Friday Special 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sale65best

examstrack slider

Google Professional-Data-Engineer Exam Success: Google Professional Data Engineer Exam Complete Study and Preparation Tips

Questions 21

Flowlogistic’s management has determined that the current Apache Kafka servers cannot handle the data volume for their real-time inventory tracking system. You need to build a new system on Google Cloud Platform (GCP) that will feed the proprietary tracking software. The system must be able to ingest data from a variety of global sources, process and query in real-time, and store the data reliably. Which combination of GCP products should you choose?

Options:

A.

Cloud Pub/Sub, Cloud Dataflow, and Cloud Storage

B.

Cloud Pub/Sub, Cloud Dataflow, and Local SSD

C.

Cloud Pub/Sub, Cloud SQL, and Cloud Storage

D.

Cloud Load Balancing, Cloud Dataflow, and Cloud Storage

Buy Now
Questions 22

You maintain ETL pipelines. You notice that a streaming pipeline running on Dataflow is taking a long time to process incoming data, which causes output delays. You also noticed that the pipeline graph was automatically optimized by Dataflow and merged into one step. You want to identify where the potential bottleneck is occurring. What should you do?

Options:

A.

Insert a Reshuffle operation after each processing step, and monitor the execution details in the Dataflow console.

B.

Log debug information in each ParDo function, and analyze the logs at execution time.

C.

Insert output sinks after each key processing step, and observe the writing throughput of each block.

D.

Verify that the Dataflow service accounts have appropriate permissions to write the processed data to the output sinks

Buy Now
Questions 23

You need ads data to serve Al models and historical data tor analytics longtail and outlier data points need to be identified You want to cleanse the data n near-reel time before running it through Al models What should you do?

Options:

A.

Use BigQuery to ingest prepare and then analyze the data and then run queries to create views

B.

Use Cloud Storage as a data warehouse shell scripts tor processing, and BigQuery to create views tor desired datasets

C.

Use Dataflow to identity longtail and outber data points programmatically with BigQuery as a sink

D.

Use Cloud Composer to identify longtail and outlier data points, and then output a usable dataset to BigQuery

Buy Now
Questions 24

MJTelco’s Google Cloud Dataflow pipeline is now ready to start receiving data from the 50,000 installations. You want to allow Cloud Dataflow to scale its compute power up as required. Which Cloud Dataflow pipeline configuration setting should you update?

Options:

A.

The zone

B.

The number of workers

C.

The disk size per worker

D.

The maximum number of workers

Buy Now
Questions 25

You are on the data governance team and are implementing security requirements to deploy resources. You need to ensure that resources are limited to only the europe-west 3 region You want to follow Google-recommended practices What should you do?

Options:

A.

Deploy resources with Terraform and implement a variable validation rule to ensure that the region is set to the europe-west3 region for all resources.

B.

Set the constraints/gcp. resourceLocations organization policy constraint to in:eu-locations.

C.

Create a Cloud Function to monitor all resources created and automatically destroy the ones created outside the europe-west3 region.

D.

Set the constraints/gcp. resourceLocations organization policy constraint to in: europe-west3-locations.

Buy Now
Questions 26

You want to schedule a number of sequential load and transformation jobs Data files will be added to a Cloud Storage bucket by an upstream process There is no fixed schedule for when the new data arrives Next, a Dataproc job is triggered to perform some transformations and write the data to BigQuery. You then need to run additional transformation jobs in BigQuery The transformation jobs are different for every table These jobs might take hours to complete You need to determine the most efficient and maintainable workflow to process hundreds of tables and provide the freshest data to your end users. What should you do?

Options:

A.

1Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Cloud Storage. Dataproc. and BigQuery operators

2 Use a single shared DAG for all tables that need to go through the pipeline

3 Schedule the DAG to run hourly

B.

1 Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Dataproc and BigQuery operators.

2 Create a separate DAG for each table that needs to go through the pipeline

3 Use a Cloud Storage object trigger to launch a Cloud Function that triggers the DAG

C.

1 Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Cloud Storage, Dataproc. and BigQuery operators

2 Create a separate DAG for each table that needs to go through the pipeline

3 Schedule the DAGs to run hourly

D.

1 Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Dataproc and BigQuery operators

2 Use a single shared DAG for all tables that need to go through the pipeline.

3 Use a Cloud Storage object trigger to launch a Cloud Function that triggers the DAG

Buy Now
Questions 27

You are building a teal-lime prediction engine that streams files, which may contain Pll (personal identifiable information) data, into Cloud Storage and eventually into BigQuery You want to ensure that the sensitive data is masked but still maintains referential Integrity, because names and emails are often used as join keys How should you use the Cloud Data Loss Prevention API (DLP API) to ensure that the Pll data is not accessible by unauthorized individuals?

Options:

A.

Create a pseudonym by replacing the Pll data with cryptogenic tokens, and store the non-tokenized data in a locked-down button.

B.

Redact all Pll data, and store a version of the unredacted data in a locked-down bucket

C.

Scan every table in BigQuery, and mask the data it finds that has Pll

D.

Create a pseudonym by replacing Pll data with a cryptographic format-preserving token

Buy Now
Questions 28

You are updating the code for a subscriber to a Put/Sub feed. You are concerned that upon deployment the subscriber may erroneously acknowledge messages, leading to message loss. You subscriber is not set up to retain acknowledged messages. What should you do to ensure that you can recover from errors after deployment?

Options:

A.

Use Cloud Build for your deployment if an error occurs after deployment, use a Seek operation to locate a tmestamp logged by Cloud Build at the start of the deployment

B.

Create a Pub/Sub snapshot before deploying new subscriber code. Use a Seek operation to re-deliver messages that became available after the snapshot was created

C.

Set up the Pub/Sub emulator on your local machine Validate the behavior of your new subscriber togs before deploying it to production

D.

Enable dead-lettering on the Pub/Sub topic to capture messages that aren't successful acknowledged if an error occurs after deployment, re-deliver any messages captured by the dead-letter queue

Buy Now
Questions 29

You want to build a managed Hadoop system as your data lake. The data transformation process is composed of a series of Hadoop jobs executed in sequence. To accomplish the design of separating storage from compute, you decided to use the Cloud Storage connector to store all input data, output data, and intermediary data. However, you noticed that one Hadoop job runs very slowly with Cloud Dataproc, when compared with the on-premises bare-metal Hadoop environment (8-core nodes with 100-GB RAM). Analysis shows that this particular Hadoop job is disk I/O intensive. You want to resolve the issue. What should you do?

Options:

A.

Allocate sufficient memory to the Hadoop cluster, so that the intermediary data of that particular Hadoop job can be held in memory

B.

Allocate sufficient persistent disk space to the Hadoop cluster, and store the intermediate data of that particular Hadoop job on native HDFS

C.

Allocate more CPU cores of the virtual machine instances of the Hadoop cluster so that the networking bandwidth for each instance can scale up

D.

Allocate additional network interface card (NIC), and configure link aggregation in the operating system to use the combined throughput when working with Cloud Storage

Buy Now
Questions 30

You are using Workflows to call an API that returns a 1 KB JSON response, apply some complex business logic on this response, wait for the logic to complete, and then perform a load from a Cloud Storage file to BigQuery. The Workflows standard library does not have sufficient capabilities to perform your complex logic, and you want to use Python's standard library instead. You want to optimize your workflow for simplicity and speed of execution. What should you do?

Options:

A.

Invoke a Cloud Function instance that uses Python to apply the logic on your JSON file.

B.

Invoke a subworkflow in Workflows to apply the logic on your JSON file.

C.

Create a Cloud Composer environment and run the logic in Cloud Composer.

D.

Create a Dataproc cluster, and use PySpark to apply the logic on your JSON file.

Buy Now
Exam Code: Professional-Data-Engineer
Exam Name: Google Professional Data Engineer Exam
Last Update: Dec 2, 2024
Questions: 372

PDF + Testing Engine

$164.99
$57.75

Testing Engine

$124.99
$43.75

PDF (Q&A)

$104.99
$36.75

Google Free Exams

Google Free Exams
Elevate your Google exam preparation with free access to high-quality resources at Examstrack.