UPDATED 12:00 EDT / SEPTEMBER 10 2019

CLOUD

Google simplifies open-source software with Cloud Dataproc on Kubernetes

Google LLC is aiming to make it easier for its cloud customers to deploy and run open-source software projects such as Apache Spark via a new version, released today, of its Cloud Dataproc service running on Kubernetes.

Cloud Dataproc is a four-year-old service that allows users take advantage of open-source data tools such as Apache Hadoop and Spark for batch processing, querying, streaming and machine learning tasks.

It provides open-source data and analytics processing capabilities for data engineers and data scientists who need to process information and train models faster at scale. It comes with automation tools that allow clusters to be created quickly, along with the ability to save money by turning clusters off when they’re not needed.

Kubernetes is a popular open-source software framework that’s used to manage large clusters of containers. Containers in turn are used to host the components of modern applications that can run on any infrastructure platform.

By combining Cloud Dataproc with Kubernetes, Google is enabling data scientists to unify resource management, isolate jobs and build resilient infrastructures across any environment, the company said in an announcement. Their open-source workloads also become much more portable.

“The overall idea Google has with its cloud services is to combine the best of Google Cloud and open source,” James Malone, Google’s product manager for managed services on open source software, told SiliconANGLE in an interview.

Malone explained that many customers face challenges in running open source software as it requires significant expertise, not just with the bewildering array of components it’s made of, but with the entire ecosystem.

“The open-source stack is very complicated,” Malone said. “Dataproc is the first managed service to take these open-source components and make them work on Kubernetes.”

Open-source jobs therefore become much simpler on Cloud Dataproc on Kubernetes. The service does away with the need to work with two separate cluster management interfaces to manage open source components, for example.

“Using Dataproc’s new capabilities, Google will give you one central view that can span both cluster management systems,” Google explained in its pitch. “Supporting both YARN and Kubernetes will give enterprises the flexibility they need to modernize certain hybrid workloads while continuing to monitor YARN-based workloads.”

The other main benefit is that users can containerize and isolate open-source software jobs on Kubernetes. This means their machine learning models and extract, transact and load pipelines can be moved from development to production without any compatibility problems. It also means customers can stop worrying about being locked in to a single environment.

“Moving to Kubernetes prevents lock-in,” Malone said. “So [customers] take jobs and run them on Amazon Elastic MapReduce for example. It’s easier to do with Kubernetes because containers are highly portable.”

Moreover, Cloud Dataproc on Kubernetes provides what Google calls a “self-healing environment” where infrastructure management tasks such as sizing and building clusters, manipulating Docker files and network configuration are all automated.

Cloud Dataproc on Kubernetes is currently available as an early “alpha” preview. At present it only works with Apache Spark but Google is planning to add more open source software projects, including Apache Flink, in the future.

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU