Using governance to spur, not stall, data access for analytics
Data governance has historically been a serious bottleneck for analytics. While managing data to ensure it complies with policies and regulations is important, these processes can also make it difficult to locate and access data. Businesses that govern data at scale, in real time, and in the cloud often find the situation even more complicated. After all, what good are real-time data streams if governance processes grind their use to a halt?
Effective governance should help employees quickly find and use data, enabling them to collaborate and create business value from the organization’s data assets. So how can data governance spur, rather than stall, this process? Some organizations say they have found a way. By blending aspects of the two main data governance models in use today, companies can build governance into the larger analytics framework.
In this way, governance is planned and executed to create competitive advantage, addressing policy compliance, security, accessibility, and usability in a frictionless and comprehensive manner. This in turn speeds the availability of the data and increases its usability to distributed team members—while maintaining centralized control over risks. Though common data governance practices present hurdles for businesses, this blending of models can potentially surmount those hurdles.
Both data governance models pose challenges
Companies are struggling to manage data at scale and in the cloud. Nearly three quarters of decision makers in a recent Forrester Research poll say they do not yet manage most of their organization’s data in the cloud. Some 80 percent say they have difficulty governing data at scale. A whopping 82 percent cite forecasting and controlling costs as a challenge in their data ecosystem, and 82 percent say confusing data governance policies are a difficulty.
Meanwhile, the volume of data companies must manage is mushrooming, and more users are clamoring for more access. “You now have much more data coming from many more sources being stored in many more places,” says Patrick Barch, senior director of product management at Capital One Software.
Organizations want to make this data accessible to more business teams, enabling new insights and more business value. Many struggle, however, to balance the need for central governance of data in the cloud—which ensures comprehensive governance but can bottleneck data access—with a decentralized model that gives lines of business more control over and access to data and analytics. Decentralization, however, has its own disadvantages. Different teams may not be aligned on governance policies. Specific data or types of data can get stuck in silos, not available to all. Machine learning engineers may lack access to the data they need to build advanced analytics tools.
“Your teams want full and instant access to the data and the tools of their choice,” says Barch. “You can’t manage everything centrally without becoming a huge bottleneck or hiring an army of data engineers, and you can’t completely decentralize the management responsibility without incurring significant data risk.”
Best of both worlds
There is a way, however, to combine centralized and decentralized approaches into a new model of data governance through federation of data management. Doing so enables businesses to realize the advantages of each, without the disadvantages.
Capital One, for example, adopted this model while the company shut down its data centers and moved operations onto the public cloud. The company implemented a cloud data warehouse to make data widely available to business teams, yet realized it needed to be attentive to data governance.
“Without good governance controls, you not only have the policy management risk, but you also risk spending much, much more money than you intend, much faster,” says Barch. “We knew that maximizing the value of our data, especially as the quantity and variety of that data scales, was going to require creating integrated experiences with built-in governance that enabled the various stakeholders involved in activities like publishing data, consuming data, governing data and managing the underlying infrastructure, to all seamlessly work together.”
What does this blended approach to data governance look like? For Capital One, it’s what Barch calls “sloped governance.” With a sloped governance approach, you can increase governance and controls around access and security for each level of data. For example, private user spaces, which don’t contain any shared data, can have minimal data governance requirements. As you move further into production, the controls get stricter and take more time to be implemented.
Capital One’s solution features a central shared-services platform where governance is applied to different types of data through machine learning automation, then validated by humans. Built-in centralized governance rules are quickly and consistently applied, but still allow data to flow freely in a decentralized fashion, enabling fast data access for lines of business.
“Not all data is equal; not all of it requires the same amount of attention,” says Barch. “This solution changes governance from an all-or-nothing approach to one that applies the right level of governance to the right scenarios, based on the level of risk.”
Blended governance approaches deliver results
This blended governance approach provides several benefits. First, it makes finding data faster and more efficient. In a centralized governance framework, different data is categorized differently, with certain governance levels requiring, for example, additional metadata fields or a certain level of service. This sort of categorizing and organization “helps analysts find the information faster, which speeds up their time to insight, and thus time to value,” says Barch.
A blended approach also enables more collaboration in design and production. Traditional corporate engineering governance standards slow the process of bringing analytics tools into production. A blended governance model, however, can speed things up because it applies just enough governance, not a full-court press that discourages innovation. “It’s like continuous integration and continuous delivery (CI/CD) for data,” says Barch, referring to a software development approach that enables developers to safely collaborate on a shared repository of code. “You don’t want the red tape of standards to unnecessarily prevent your data analysts and scientists from getting their code operationalized and producing results.”
By adopting such a blended approach, organizations can ensure effective governance without unnecessarily restricting or slowing the use of data. They can also encourage alignment and collaboration of teams, while also controlling costs and reducing overhead. “An analytics platform with this kind of built-in governance means your people can trust that the data is well managed, while also enabling teams to operate at the speed of business,” says Barch.
This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.