Optimizing the supply chain with a data lakehouse
When a commercial ship travels from the port of Ras Tanura in Saudi Arabia to Tokyo Bay, it’s not only carrying cargo; it’s also transporting millions of data points across a wide array of partners and complex technology systems.
Consider, for example, Maersk. The global shipping container and logistics company has more than 100,000 employees, offices in 120 countries, and operates about 800 container ships that can each hold 18,000 tractor-trailer containers. From manufacture to delivery, the items within these containers carry hundreds or thousands of data points, highlighting the amount of supply chain data organizations manage on a daily basis.
Until recently, access to the bulk of an organizations’ supply chain data has been limited to specialists, distributed across myriad data systems. Constrained by traditional data warehouse limitations, maintaining the data requires considerable engineering effort; heavy oversight, and substantial financial commitment. Today, a huge amount of data—generated by an increasingly digital supply chain—languishes in data lakes without ever being made available to the business.
A 2023 Boston Consulting Group survey notes that 56% of managers say although investment in modernizing data architectures continues, managing data operating costs remains a major pain point. The consultancy also expects data deluge issues are likely to worsen as the volume of data generated grows at a rate of 21% from 2021 to 2024, to 149 zettabytes globally.
“Data is everywhere,” says Mark Sear, director of AI, data, and integration at Maersk. “Just consider the life of a product and what goes into transporting a computer mouse from China to the United Kingdom. You have to work out how you get it from the factory to the port, the port to the next port, the port to the warehouse, and the warehouse to the consumer. There are vast amounts of data points throughout that journey.”
Sear says organizations that manage to integrate these rich sets of data are poised to reap valuable business benefits. “Every single data point is an opportunity for improvement—to improve profitability, knowledge, our ability to price correctly, our ability to staff correctly, and to satisfy the customer,” he says.
Organizations like Maersk are increasingly turning to a data lakehouse architecture. By combining the cost-effective scale of a data lake with the capability and performance of a data warehouse, a data lakehouse promises to help companies unify disparate supply chain data and provide a larger group of users with access to data, including structured, semi-structured, and unstructured data. Building analytics on top of the lakehouse not only allows this new architectural approach to advance supply chain efficiency with better performance and governance, but it can also support easy and immediate data analysis and help reduce operational costs.
This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.