Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Information about Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Kevin David

 

Modak, a leading provider of modern data engineering solutions, is now a certified solution partner with Cloudera.   Customers can now seamlessly automate migration to Cloudera’s Hybrid Data Platform —  Cloudera Data Platform (CDP) to dynamically auto-scale cloud services with Cloudera Data Engineering (CDE) integration with Modak Nabu™.

Modak’s Nabu™is a born in the cloud, cloud-neutral integrated data engineering platform designed to accelerate the journey of enterprises to the cloud. Modak empowers organizations to maximize their ROI from existing analytics infrastructure through interoperability. The platform converges data cataloging, data ingestion, data profiling, data tagging, data discovery, and data exploration into a unified platform, driven by metadata. Modak Nabu™ automates repetitive tasks in the data preparation process and thus accelerates the data preparation by 4x. And most importantly, it democratizes access to end-users, such as Data Engineering teams, Data Science teams, and even citizen data scientists, across the organization while ensuring compliance with data governance policies are met.

Cloud Speed and Scale

In the cloud, it’s more critical right now than ever to have portability across cloud providers and for hybrid deployments. With CDP, enterprises can avoid vendor lock-in while being able to take advantage of key cloud capabilities such as elasticity and separated compute and storage. Also, enterprises can tap into new technologies like Kubernetes.

With Modak Nabu™ on CDP, enterprises can shift to cloud architectures with ease, with their choice of one or more cloud providers. They will automatically get the benefits of CDP Shared Data Experience (SDX) with enterprise-grade security and governance.

Modak Nabu™ reliably curates datasets for any line of business and personas, from business analysts to data scientists. Customers using Modak Nabu™ with CDP today have deployed Data Lakes and

profiled their data at unprecedented speed — in one use-case a pharmaceutical customer data lake and cloud platform was up and running within 12 weeks. Over 170 different data sources — from Oracle, MySQL, Hive, SAS, and many others — were ingested and profiled by Modak Nabu™, totaling over 80K tables at Petabyte scale. This is the scale and speed that cloud-native solutions can provide — and Modak Nabu™ with CDP has been delivering the same.

Modak NabuTM and CDE’s Spark-on-Kubernetes

Modak Nabu™ relies on a framework of “Botworks”, a series of micro-jobs to accomplish various data transformation steps from ingestion to profiling, and indexing. That is why having a flexible, and efficient Spark-based service was critical.

Cloudera Data Engineering within CDP provides :

  • Fully managed Spark-on-Kubernetes service that hides the complexity running production DE workloads at scale.
  • Auto-scaling backed by Apache Yunikorn, a high performance scheduler that provides resource quota management, FIFO, FAIR scheduling designed for the cloud.
  • Cost efficiencies by taking advantage of Spot instances
  • First-class APIs to support automation and CI/CD use cases for seamless integration 
  • Integrated security model 

Figure 1: CDE containerized service for operational management of spark workloads

As Spark jobs are deployed by Modak Nabu™they are efficiently scheduled  and executed on CDE’s autoscaling service that’s optimized for kubernetes.  With Virtual Cluster CDE can support multiple tenants and LoB, by providing strong isolation and per tenant compute quotas for cost management and chargeback models.

The first-class APIs provide full life-cycle management of the Spark pipelines and allows seamless integration with applications like Modak Nabu™.  This allows easy tracking of pipeline status, log management, and troubleshooting at the individual job level.

Knowledge Graphs for the Business

Through Modak Nabu’s profiling and indexing, Modak Nabu™ provides a comprehensive view of the curated datasets that are easily accessible to end-users — whether it’s Data Scientists building machine learning models or Data Analysts building operational reports.

Modak Nabu™ goes beyond the profiled catalog and generates a traversable knowledge graph that allows users to interact and trace the dependencies between their data at the granularity of attributes.

Modak Nabu™ goes beyond the profiled catalog and generates a traversable knowledge graph that allows users to interact and trace the dependencies between their data at the granularity of attributes.

Data Operations

Modak NabuTM also provides the administrators of the platform and other business stakeholders a holistic view of the ingestion framework operations. A monitoring dashboard for pipelines provides summary information about the status of pipelines and supporting information that can help to quickly troubleshoot any errors. An executive dashboard shows the status of crawlers, pipelines, and data profiling. This makes it easy to assess the health of the system and quickly identify potential issues.

Conclusion

With the certification of Modak Nabu™with CDE, customers can now deploy data operations at scale in a cloud agnostic way with control over cost and performance.   With security and governance of Cloudera’s enterprise data platform, the operational efficiencies provided by CDE service, and data ingestion and preparation engine of Modak Nabu™  customers can break their data silos and unlock the value of their data to accelerate data-driven business decisions.  Start your journey with a test drive and sign-up for a 60-day trial to see how CDP can help.

Breaking Story – Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

The Latest News on Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Source link
Category – Big Data

Leave a Reply

Your email address will not be published. Required fields are marked *