2022 article

Compact Walks: Taming Knowledge-Graph Embeddings With Domain- and Task-Specific Pathways

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), pp. 458–469.

author keywords: Biomedical knowledge graphs (KGs); KG embeddings; domain- and task-specific regular expressions for creating node neighborhoods
TL;DR: The findings suggest that the proposed CompactWalks approach has the potential to address the promiscuity and runtime-performance challenges in applying embedding tools to large-scale KGs in real life, in the biomedical domain and possibly beyond. (via Semantic Scholar)
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: September 26, 2022

Knowledge-graph (KG) embeddings have emerged as a promise in addressing challenges faced by modern biomedical research, including the growing gap between therapeutic needs and available treatments. The popularity of KG embeddings in graph analytics is on the rise, due at least partially to the presumed semanticity of the learned embeddings. Unfortunately, the ability of a node neighborhood picked up by an embedding to capture the node's semantics may depend on the characteristics of the data. One of the reasons for this problem is that KG nodes can be promiscuous, that is, associated with a number of different relationships that are not unique or indicative of the properties of the nodes. To address the promiscuity challenge and the documented runtime-performance challenge in real-life KG embedding tools, we propose to use domain- and task-specific information to specify regular-expression pathways that define neighborhoods of KG nodes of interest. Our proposed CompactWalks framework uses these semantic subgraphs to enable meaningful compact walks in random-walk based KG embedding methods. We report the results of case studies for the task of determining which pharmaceutical drugs could treat the same diseases. The findings suggest that our CompactWalks approach has the potential to address the promiscuity and runtime-performance challenges in applying embedding tools to large-scale KGs in real life, in the biomedical domain and possibly beyond.