Looking at the flood of news coming from the area of Generative AI and Language Models in particular, my looming question for this year Knowledge Graph Conference (KGC 2023) was: How does the Future for Knowledge Graphs look like in a world, where the machines are able to teach themselves nearly everything. The unanimous answer is: Knowledge Graphs will become more important than ever, but only if the community embraces it.
My five main takeaways
The amount of input was huge this year 2 Days of Workshops with 4 Streams, 2 Days of Main Conference with 5 and one day of tool presentations have shown, how immensely divers the topic is. Nevertheless, I want to highlight five main themes, I personally consider as main insights.
1. Learning is still hard
People that are a fan of semantics and graphs never leave, but bringing people into that realm still seems to be hard. We need to collaborate more and build better ways to educate people on a shared curriculum and with top notch content. If possible content should be provided as self-learning with practical examples and projects. We also have to think about how to onboard our colleagues, clients and users to our solutions, meaning it is not always about the technology and more about the WHY. Not everybody learns the same way or requires the same level of complexity.
2. Tools are evolving
Everything is in motion. We see established tools and platforms extend their capabilities especially adding Large Language Models but also becoming more open, easy to use and cloud ready. Cloud native solutions like Relational AI find their sweet spot in collaboration with Snowflake and Amazon Neptune is moving in the direction of collaborative open-source solutions with their graph notebook and explorer. Even tools from other areas like databricks and dbt are recognizing the potential of graphs. And finally, my feeling is the discussion Labeled Property Graph vs. Semantic KG seems to be over since many new tools just provide a choice between both.
3. UX and Visual Representations
Despite being one of the most visually intuitive and universally understandable way of conveying information in context, the graph community still struggles to develop a consistent and established way how to utilize the potentials of graphs for a great user experience. Especially in representing the semantics, supporting modelers, and guiding users through complexity it still feels like only very dedicated people will take the challenge. There have been some glimpses, but we need an evolution in the tools to make them easy to use, intuitive, visual, and exiting as Adam Keresztes defined in his talk.
4. Use Cases
In recent years there has been a lot of talk about the harmonization of data sources to form a unified semantic layer for all applications to work with. This case is still valid but one of the next natural questions is obvious: How to effectively use that information to drive optimized decision making? Also connected to the LLM Topic the case of recommendation systems has been a focus point this year. Either through the selection of the right products in retail, the right content from content management systems, personalization, or the selection of the right configuration for transmission devices. Nevertheless, we still seem to struggle to sell the myriad of use cases. We say Knowledge Graphs can be used for everything but tend to focus too much on the technology and less on the value a case can bring to the customer. To improve this, we need to adjust the way of thinking and engage people that talk the language of the client.
5. Large Language Models
Finally, the focus of the conference has been the impact of large language models on the KG World. All other variants of Generative AI did not get too much attention. The consensus was: There is a huge possibility for KG if we embrace the technology and push it in the direction of hybrid AI-Semantics solutions. Many of the shortcomings of the LLMs like explainability, ease of operations and updates, factual correctness (hallucinations) and the use of client specific non-public data can be treated through intelligently integrating the Graph into the workflow.
In General, the huge models have implicitly learned common knowledge which is openly available. Like with humans, this implicit knowledge is hidden and not directly shareable. That is the reason we need language interfaces and prompt engineering to extract that knowledge making it explicit. Similar to every Knowledge Manager that works with humans the semantic community is strongly advocating for shared explicit knowledge that can be checked, validated and used for reasoning. And since KG is one technology that has been developed to enable modeling human knowledge in a machine-readable way it seems to be a clear favorite on the format that should be used to interact with the new AI.
The main strategies that have been presented at KGC were talking about knowledge infusion though intelligent prompt modeling. There were other ideas like infusing it directly to the transformer or build a corpus to train your individual model on, but the optimal way still seems to be a point of research. Common approaches were to let the language model generate a query for the graph database to retrieve the facts required (Knowledge Extractor) for the question and then let another model (Knowledge Injector) add the retrieved data with intelligent prompt engineering to the original question. Regardless of the complexity and cost of running a system like that, the actual results for treating hallucinations seem quite promising. To make the process more efficient some suggested using embedding techniques and store the knowledge snapshots in a Vector DB.
Thinking about how to operate and update a model in the future is also intriguing, if you basically have to retrain all the implicit knowledge to get to the same outcome. In my opinion it is more likely that models become smaller, extendable, or more specialized as we already see now i.e., with plugins from OpenAI. My projection would be that the Models will become a commodity for specific languages, tasks and interfaces you can change as required. After all they are called language models and not knowledge models.
Additionally, the AI could have a huge positive impact on my first point. Since they can i.e., generate SPARQL, Model things in OWL with the click of a button. Their general capability to learn a defined language, be methodically correct and freely reason on the given information can be a huge asset for making Semantics itself much more accessible and user friendly. Building a correct knowledge graph will still be the hard work, that has to be done to achieve a common Knowledge Base but the creation process and extracting information either from text or databases should become much easier.
KGC has been a great experience this year and I am really looking forward to next year. My projection on topics is probably a hugely improved AI with new capabilities. But I am also exited that the topic of Dataspaces, Linked Data and Digital Twins has not been as big as I would have expected, considering the buzz it has in Europe right now. So maybe next year some new topics will swash across the pond.
Stay tuned for my follow up blogs on the results of the Panel ”How to teach Knowledge Graph” and “What KG is not?”