Treehouse Dataflow Toolkit (TDT): Fully Automated, Cloud-native Data Transfer from Mainframe Data Sources to Analytics/ML/AI-friendly Targets
TDT provides an easy and fast approach that enables rapid and comprehensive data transfer from Kafka pipelines to Amazon Redshift, Snowflake, Amazon Athena/S3, Amazon S3 Express One Zone, and Amazon Aurora PostgreSQL–AI-ready, with all target resources automatically created.
Treehouse Dataflow Toolkit (TDT) is a set of Lambda-based microservices that assure highly-available, auto-scalable, and event-driven data transfers to your data science teams’ favorite analytics frameworks.
Customers either already have, or are in the process of acquiring, software tools that replicate their mainframe data into Kafka pipelines (i.e., Amazon MSK, Confluent, etc.). Our new and innovative offering, TDT, provides the turnkey solution for getting this data from Kafka into advanced Analytics/AI/ML-friendly targets, such as Amazon Redshift, Snowflake, Amazon Athena/S3, Amazon S3 Express One Zone Buckets, as well as Amazon Aurora PostgreSQL, all the while adhering to AWS’s and Snowflake’s recommended best practices for massive data loading, thus assuring shortest and surest loads.
When a mainframe data replication tool, such as Rocket Data Replicate and Sync (RDRS) publishes both bulk-load and CDC data to a reliable and scalable framework like Kafka, it sets the stage for TDT to feed legacy data from Kafka to any number of ETL tools, target datastores, and data analytics packages (some of which may not even have been invented yet!).
TDT not only keeps things up to date faster than any conceivable ODBC-based solution, but the “delta tables” into which it loads data also inherently retain the entire history of source data ever since mainframe-to-target synchronization began. So, for example, after TDT has been syncing a target table for 5 years, a data scientist now has 5 years’ worth of historical data to work with for trend analysis, predictive analytics, prescriptive analytics, ML, etc.
While TDT’s unique “delta-tables” approach offers comprehensive “history” for advanced analytics, the traditional need for up-to-the-second, current snapshots of mainframe datastores is also completely provided for. Adhering once again to target vendors’ “best practices”, self-materializing views are provided to work with current data, as well as in fully-structured views which provide the more traditional look and feel of a SQL database.
Treehouse provides highly-detailed CloudFormation Templates which automate and accelerate the process of installing and configuring the complete TDT application (including AWS Lambda functions and a number of other AWS resources) in your AWS account(s). The TDT CloudFormation Templates create stacks consisting of all principal framework components, along with related IAM policies and roles which are carefully engineered to comply with “best practices” (such as a “least privileges” approach to permissions).
The TDT CloudFormation Templates also optionally provide for automatic creation of a VPC, its subnets, and all required standard VPC-oriented resources, as well as optional creation of a source database cluster (consisting of either a sample database provided by Treehouse for a quick trial/POC, or your own database and data).
Simply put, TDT is a self-contained, automated solution that can eliminate months (or even years) of research and development time and costs, and allow customers to be up and running in minutes. TDT provides the turnkey solution for rapidly transferring data to advanced Analytics/ML/AI-friendly targets.