UPDATED 22:50 EDT / JANUARY 21 2021

CLOUD

The new data paradigm: Embrace decentralization

There are times when a centralized data source may not always be a good idea.

In a paper published in May 2019, Zhamak Dehghani (pictured), director of Next Tech Incubation NA at ThoughtWorks Inc., uncovered the trouble with data lakes. Despite hefty investments by major enterprises, such centralized reservoirs have created failure modes in deriving business value. Instead, Dehghani called for a paradigm shift that would draw on basic tenets of modern distributed architecture. For her, the time has come to start treating data as a product.

“Let’s decouple this world of analytical data to mirror the same way we have decoupled our systems and teams and business,” Dehghani said. “Why should data be any different? Let’s bring product thinking and treating data as a product to the data that teams now share.”

Dehghani spoke with Dave Vellante, host of theCUBE, SiliconANGLE Media’s livestreaming studio, during theCUBE on Cloud event. They discussed the key elements of a data mesh, recognizing the value of data to the business, building models around complexity, and the evolving role of governance in the enterprise.

Embracing decentralization

Dehghani is an advocate for rethinking how enterprises create and manage data architectures. Her approach favors decentralized over monolithic structures and elevating domain knowledge as the primary criterion for organizing big data teams and platforms.

Moving beyond monolithic, centralized data lakes and “all-in-one” data warehouses, to embrace the distributed nature of information will require a data mesh architecture, according to Dehghani. This philosophy is grounded in the reality that the industry as a whole has moved dramatically away from a centralized model.

“If you look at the parallel movement of our industry in general, since the birth of internet, we are actually moving towards decentralization,” Dehghani noted. “If we said the only way we can get access to various applications on the web is to centralize them, we would laugh at that idea. But for some reason we don’t question that when it comes to data.”

In addition to being a bottleneck, centralized data structures miss a critical element behind all of the work behind data management in the first place: It’s of no use unless people can fully recognize data’s value. This often ends up being a primary reason why machine learning models ultimately fail.

“We end up training machine learning models on data that is not really representative of the reality of the business, and then we put them into production and they don’t work,” Dehghani said. “It’s managed by a team of highly specialized people who are struggling to understand the actual value of the data. It’s not going to get us to where our aspirations or ambitions need to be.”

Self-serve infrastructure

The solution, according to Dehghani, is to consider data domains as first-class citizens and apply platform thinking to create a self-serve data infrastructure. As data volume has exploded, so have the multiple sources where information is being generated.

This will require accepting the resulting complexity and building a platform to accommodate that.

“It’s time to embrace the complexity that comes with the growth of a number of sources,” Dehghani said. “The architecture, technology and organization structure incentives need to move to embrace that complexity. That requires a paradigm shift in full stack.”

The success of a distributed data platform relies on domain data teams to apply product thinking to the datasets provided within an organization. Unique data assets are the product; data scientists and engineers are the customers.

Much as data has been treated for decades in a centralized manner, so has governance. But one governance model does not necessarily meet all needs, and Dehghani envisions an adaptable framework.

“The governance model in the old world has been very command and control, very centralized,” Dehghani said. “In the world of a data mesh, the job of data governance as a function becomes finding equilibrium between what decisions need to be made and enforced globally and what decisions need to be made locally.”

What about the application of key machine learning tools, such as TensorFlow or PyTorch?

“I truly believe we need to reimagine that world,” Dehghani said. “Go make it happen ‘platform,’ go provision everything I need so as a data product developer all I can focus on is the data itself. We have a lot of work to do.”

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of theCUBE on Cloud event.

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU