AWS: Using Trifacta -known as GCP Cloud Dataprep -in AWS — Part 1. Setting up

I came across Trifacta, when was helping customer to move to AWS. Customer had several data pipelines to GCP BigQuery(data warehouse) and wanted to utilize low-code approach for data wrangling, it was implemented in GCP Cloud Dataprep.

From https://www.trifacta.com:

Trifacta is the only open and interactive cloud platform for data engineers and analysts to collaboratively profile, prepare, and pipeline data for analytics and machine learning.

Image is courtesy of https://www.trifacta.com

Trifacta lets you profile, prepare, and pipeline your data on any cloud data warehouse or cloud data lake / lakehouse.

Trifacta success apparently comes from “no-code” approach — this made product so successful, that GCP integrated product to its cloud under the name “Cloud Dataprep”.

But wait, it is not only GCP, as shown on diagram above, it connects to major clouds and even to the data on premises.

Note: It is important to understand, that Trifacta provides batch approach, not live streaming data pipeline.

Getting started

If you used Cloud Dataprep in GCP, you might notice popup, when used it first time, that Trifacta wants to access your account and thus we start with navigating to https://www.trifacta.com

Click “Start Free” — fill in information, and select AWS :)

You will get to exact same console view as if you were in GCP, but without side navigation panel. Make sure you confirm email Trifacta sends you as follow up for your trial sign up.

There are many connectors available and we will check them later, but right out of the box you can see 2 — S3 and Redshift.

Configuring AWS

To configure access to your AWS account click “Configure your AWS Account”

Or, if you closed it, navigate to Admin Console, clicking on your user avatar on the bottom left

You will see new side menu:

Click Workspace settings first to enable S3 Access

Now click AWS settings, and provide either IAM role or access key

Note: Trifacta recommends to use IAM role

Steps involved in Trifacta configuration using IAM role, are explained in details if you select “Use a cross-account role (IAM role)” and click “Use our step by step configuration->

Here is part of the screen print , once you select “Step by step configuration”

Summary:

  1. Trifacta is no-code approach for data engineering.
  2. It works in batch mode
  3. One of the use cases — migration between the clouds and utilizing existing data wrangling recipes

Stay tune for the Part 2.

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

A Gentle Beginner Introduction To Numpy Arrays

JMM, Java Concurrency, and JVM

Plasma 5.24 is now available in Feren OS

How To Design A Mobile Call To Action Button In Elementor The Templace

Extending Python with C Extension Modules

CSS Basic: Text color and background color

{UPDATE} Math 2s - Math Hack Free Resources Generator

You Really Should Be Using Signed Commits on GitHub

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
John Gakhokidze

John Gakhokidze

More from Medium

GCP- Google Cloud Function

Using Google Cloud Data Fusion APIs to get Pipeline Run Times and Metrics

AWS Systems Manager Automation — Part 1

Data Transfer from Amazon S3 to PostgreSQL (on RDS) — 1