SAP upgrades Datasphere, transforming enterprise data lakes


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


SAP made a big bet on AI agents at its annual TechEd conference, infusing the technology with its gen AI copilot, Joule.

But while gen AI was a hot topic at the event (as is the case everywhere), it was not the only one. The German software major also had plenty to share on the data front, including how it plans to give enterprises a full-blown suite of tools to make the most of their datasets – without compromising their original context.

SAP debuted new data lake capabilities, a new knowledge graph engine and a way to accelerate real-time risk analysis. The capabilities are not available immediately but are expected to debut in the coming months, helping enterprises store, process and drive value from their data quickly and efficiently.

The move comes at a time when leading enterprises are revamping their AI and data offerings to better meet enterprise needs. Just recently, Salesforce, which has been a leading CRM player, announced a hard pivot to AgentForce, an ecosystem of AI agents that can make decisions and act on business information. The company also expanded its data cloud with new capabilities and connectors to boost the performance of these agents.

New data lake and knowledge graph engine

With its Business Technology Platform (BTP), SAP has been providing enterprises with multiple key capabilities under one umbrella, such as data management, analytics and Al, automation and application development.

The idea is to give teams everything they need to build new applications, extend existing ones or integrate various systems into its cloud computing environment.

For all things data, which is one of the key pillars of the BTP experience, the company relies on a ‘Datasphere,’ which allows enterprises to connect, store and manage data from SAP and non-SAP systems and eventually link it with SAP Analytics Cloud and other tools for downstream applications.

Now, this Datasphere, powered by the processing of SAP HANA Cloud, is getting new data lake capabilities.

SAP previously already provided a data lake service — with HANA Cloud, Data Lake — for hosting structured, semi-structured and unstructured data.

But it was more of a bolt-in solution with users having to assign a Datasphere space to access and work with the information in their data lake instance. This, on many occasions, meant losing valuable context held in the original data.

With the new embedded data lake, the company is building on its previous work and expanding the data fabric architecture of Datasphere with an integrated object store. This provides a much simpler way to store large amounts of data in their original form and scale according to needs. 

“As it is embedded in SAP Datasphere, the object store will provide another layer within the data stack that facilitates the onboarding of data products from SAP applications such as SAP S/4HANA, SAP BW, etc. Customers will be able to leverage all the core capabilities of SAP Datasphere such as analytic models, catalog, data integration, and more for the object store and will allow for direct access to the store for faster and better decision making,” a company spokesperson told VentureBeat.

For transforming and processing the data in the object store, the company is providing teams with Spark compute. Meanwhile, for querying, teams will have a functionality called SQL on files that provides access to the data without replicating the information. 

In addition to the embedded data lake capabilities, SAP has also announced a knowledge graph engine – based on the industry standard Resource Description Framework – to help enterprises understand complex relationships in their Datasphere data points (business entities, purchase orders, invoices, context from existing applications) that would otherwise go unnoticed with manual data modeling efforts.

“Each piece of information is stored in the database in three parts: the subject of the data, the object to which it is related and the nature of the relationship between the two. This approach efficiently organizes data into a web of interconnected facts, making it easier to see how different pieces of information relate to one another,” the company wrote in a statement. 

This would ultimately help enterprises get a better understanding of their data and use it for AI-specific use cases, including grounding AI models and enabling them to deliver context-aware insights. The knowledge engine also supports SPARQL semantic query language that lets users interact with and extract useful information from a knowledge graph.

Risk analysis in real-time

Finally, SAP announced Compass, a new feature for its Analytics Cloud offering that allows users to model complex risk scenarios and simulate their potential outcomes.

This can help companies prepare ahead of potential challenges, such as supply chain disruptions or rises in commodity prices, and minimize their downstream impact on operational expenses and revenue.

At the heart of Cloud compass lies the Monte Carlo simulation, a computational technique that calculates the probability of different outcomes by running simulations with random variables.

It saves the time and effort required for manual analysis and provides results, with probability distributions and their corresponding boundaries, through an easy-to-use UI that non-technical users can easily use to make business decisions. 

When will the new features launch?

The data lake capabilities will become generally available by the end of the fourth quarter of 2024, while knowledge graph and Analytics Cloud compass will make it to users in the first half of 2025. The exact timeline remains unclear at this stage but the plan is pretty much clear: SAP wants to provide enterprises with a more cohesive ecosystem of capabilities to bring more relevant, context-rich data into the Datasphere and use it to run powerful applications critical to business decisions.

When asked what kind of ROI can enterprises expect from the new data lake capabilities, the spokesperson cited a case study with GigaOM, where a business data fabric enabled via SAP Datasphere showed a three-year TCO of 42% versus that of a DIY implementation cost. 



Source link