new

Get trending papers in your email inbox!

Subscribe

Daily Papers

by AK and the research community

Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing

Text-to-image diffusion models have demonstrated remarkable progress in synthesizing high-quality images from text prompts, which boosts researches on prompt-based image editing that edits a source image according to a target prompt. Despite their advances, existing methods still encounter three key issues: 1) limited capacity of the text prompt in guiding target image generation, 2) insufficient mining of word-to-patch and patch-to-patch relationships for grounding editing areas, and 3) unified editing strength for all regions during each denoising step. To address these issues, we present a Vision-guided and Mask-enhanced Adaptive Editing (ViMAEdit) method with three key novel designs. First, we propose to leverage image embeddings as explicit guidance to enhance the conventional textual prompt-based denoising process, where a CLIP-based target image embedding estimation strategy is introduced. Second, we devise a self-attention-guided iterative editing area grounding strategy, which iteratively exploits patch-to-patch relationships conveyed by self-attention maps to refine those word-to-patch relationships contained in cross-attention maps. Last, we present a spatially adaptive variance-guided sampling, which highlights sampling variances for critical image regions to promote the editing capability. Experimental results demonstrate the superior editing capacity of ViMAEdit over all existing methods.

Towards Generic Image Manipulation Detection with Weakly-Supervised Self-Consistency Learning

As advanced image manipulation techniques emerge, detecting the manipulation becomes increasingly important. Despite the success of recent learning-based approaches for image manipulation detection, they typically require expensive pixel-level annotations to train, while exhibiting degraded performance when testing on images that are differently manipulated compared with training images. To address these limitations, we propose weakly-supervised image manipulation detection, such that only binary image-level labels (authentic or tampered with) are required for training purpose. Such a weakly-supervised setting can leverage more training images and has the potential to adapt quickly to new manipulation techniques. To improve the generalization ability, we propose weakly-supervised self-consistency learning (WSCL) to leverage the weakly annotated images. Specifically, two consistency properties are learned: multi-source consistency (MSC) and inter-patch consistency (IPC). MSC exploits different content-agnostic information and enables cross-source learning via an online pseudo label generation and refinement process. IPC performs global pair-wise patch-patch relationship reasoning to discover a complete region of manipulation. Extensive experiments validate that our WSCL, even though is weakly supervised, exhibits competitive performance compared with fully-supervised counterpart under both in-distribution and out-of-distribution evaluations, as well as reasonable manipulation localization ability.

Heterogeneous Graph Contrastive Learning with Meta-path Contexts and Adaptively Weighted Negative Samples

Heterogeneous graph contrastive learning has received wide attention recently. Some existing methods use meta-paths, which are sequences of object types that capture semantic relationships between objects, to construct contrastive views. However, most of them ignore the rich meta-path context information that describes how two objects are connected by meta-paths. Further, they fail to distinguish negative samples, which could adversely affect the model performance. To address the problems, we propose MEOW, which considers both meta-path contexts and weighted negative samples. Specifically, MEOW constructs a coarse view and a fine-grained view for contrast. The former reflects which objects are connected by meta-paths, while the latter uses meta-path contexts and characterizes details on how the objects are connected. Then, we theoretically analyze the InfoNCE loss and recognize its limitations for computing gradients of negative samples. To better distinguish negative samples, we learn hard-valued weights for them based on node clustering and use prototypical contrastive learning to pull close embeddings of nodes in the same cluster. In addition, we propose a variant model AdaMEOW that adaptively learns soft-valued weights of negative samples to further improve node representation. Finally, we conduct extensive experiments to show the superiority of MEOW and AdaMEOW against other state-of-the-art methods.

ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models

In this paper, we present the findings of our Project ALPINE which stands for ``Autoregressive Learning for Planning In NEtworks." Project ALPINE initiates a theoretical investigation into the development of planning capabilities in Transformer-based language models through their autoregressive learning mechanisms, aiming to identify any potential limitations in their planning abilities. We abstract planning as a network path-finding task where the objective is to generate a valid path from a specified source node to a designated target node. In terms of expressiveness, we show that the Transformer is capable of executing path-finding by embedding the adjacency and reachability matrices within its weights. Our theoretical analysis of the gradient-based learning dynamic of the Transformer reveals that the Transformer is capable of learning both the adjacency matrix and a limited form of the reachability matrix. These theoretical insights are then validated through experiments, which demonstrate that the Transformer indeed learns the adjacency matrix and an incomplete reachability matrix, which aligns with the predictions made in our theoretical analysis. Additionally, when applying our methodology to a real-world planning benchmark, called Blocksworld, our observations remain consistent. Our theoretical and empirical analyses further unveil a potential limitation of Transformer in path-finding: it cannot identify reachability relationships through transitivity, and thus would fail when path concatenation is needed to generate a path. In summary, our findings shed new light on how the internal mechanisms of autoregressive learning enable planning in networks. This study may contribute to our understanding of the general planning capabilities in other related domains.

PA&DA: Jointly Sampling PAth and DAta for Consistent NAS

Based on the weight-sharing mechanism, one-shot NAS methods train a supernet and then inherit the pre-trained weights to evaluate sub-models, largely reducing the search cost. However, several works have pointed out that the shared weights suffer from different gradient descent directions during training. And we further find that large gradient variance occurs during supernet training, which degrades the supernet ranking consistency. To mitigate this issue, we propose to explicitly minimize the gradient variance of the supernet training by jointly optimizing the sampling distributions of PAth and DAta (PA&DA). We theoretically derive the relationship between the gradient variance and the sampling distributions, and reveal that the optimal sampling probability is proportional to the normalized gradient norm of path and training data. Hence, we use the normalized gradient norm as the importance indicator for path and training data, and adopt an importance sampling strategy for the supernet training. Our method only requires negligible computation cost for optimizing the sampling distributions of path and data, but achieves lower gradient variance during supernet training and better generalization performance for the supernet, resulting in a more consistent NAS. We conduct comprehensive comparisons with other improved approaches in various search spaces. Results show that our method surpasses others with more reliable ranking performance and higher accuracy of searched architectures, showing the effectiveness of our method. Code is available at https://github.com/ShunLu91/PA-DA.

Simplicial Closure and higher-order link prediction

Networks provide a powerful formalism for modeling complex systems by using a model of pairwise interactions. But much of the structure within these systems involves interactions that take place among more than two nodes at once; for example, communication within a group rather than person-to person, collaboration among a team rather than a pair of coauthors, or biological interaction between a set of molecules rather than just two. Such higher-order interactions are ubiquitous, but their empirical study has received limited attention, and little is known about possible organizational principles of such structures. Here we study the temporal evolution of 19 datasets with explicit accounting for higher-order interactions. We show that there is a rich variety of structure in our datasets but datasets from the same system types have consistent patterns of higher-order structure. Furthermore, we find that tie strength and edge density are competing positive indicators of higher-order organization, and these trends are consistent across interactions involving differing numbers of nodes. To systematically further the study of theories for such higher-order structures, we propose higher-order link prediction as a benchmark problem to assess models and algorithms that predict higher-order structure. We find a fundamental differences from traditional pairwise link prediction, with a greater role for local rather than long-range information in predicting the appearance of new interactions.

Neural Common Neighbor with Completion for Link Prediction

Despite its outstanding performance in various graph tasks, vanilla Message Passing Neural Network (MPNN) usually fails in link prediction tasks, as it only uses representations of two individual target nodes and ignores the pairwise relation between them. To capture the pairwise relations, some models add manual features to the input graph and use the output of MPNN to produce pairwise representations. In contrast, others directly use manual features as pairwise representations. Though this simplification avoids applying a GNN to each link individually and thus improves scalability, these models still have much room for performance improvement due to the hand-crafted and unlearnable pairwise features. To upgrade performance while maintaining scalability, we propose Neural Common Neighbor (NCN), which uses learnable pairwise representations. To further boost NCN, we study the unobserved link problem. The incompleteness of the graph is ubiquitous and leads to distribution shifts between the training and test set, loss of common neighbor information, and performance degradation of models. Therefore, we propose two intervention methods: common neighbor completion and target link removal. Combining the two methods with NCN, we propose Neural Common Neighbor with Completion (NCNC). NCN and NCNC outperform recent strong baselines by large margins. NCNC achieves state-of-the-art performance in link prediction tasks. Our code is available at https://github.com/GraphPKU/NeuralCommonNeighbor.

Neighborhood-aware Scalable Temporal Network Representation Learning

Temporal networks have been widely used to model real-world complex systems such as financial systems and e-commerce systems. In a temporal network, the joint neighborhood of a set of nodes often provides crucial structural information useful for predicting whether they may interact at a certain time. However, recent representation learning methods for temporal networks often fail to extract such information or depend on online construction of structural features, which is time-consuming. To address the issue, this work proposes Neighborhood-Aware Temporal network model (NAT). For each node in the network, NAT abandons the commonly-used one-single-vector-based representation while adopting a novel dictionary-type neighborhood representation. Such a dictionary representation records a downsampled set of the neighboring nodes as keys, and allows fast construction of structural features for a joint neighborhood of multiple nodes. We also design a dedicated data structure termed N-cache to support parallel access and update of those dictionary representations on GPUs. NAT gets evaluated over seven real-world large-scale temporal networks. NAT not only outperforms all cutting-edge baselines by averaged 1.2% and 4.2% in transductive and inductive link prediction accuracy, respectively, but also keeps scalable by achieving a speed-up of 4.1-76.7x against the baselines that adopt joint structural features and achieves a speed-up of 1.6-4.0x against the baselines that cannot adopt those features. The link to the code: https: //github.com/Graph-COM/Neighborhood-Aware-Temporal-Network.

Pay Attention to Evolution: Time Series Forecasting with Deep Graph-Evolution Learning

Time-series forecasting is one of the most active research topics in artificial intelligence. Applications in real-world time series should consider two factors for achieving reliable predictions: modeling dynamic dependencies among multiple variables and adjusting the model's intrinsic hyperparameters. A still open gap in that literature is that statistical and ensemble learning approaches systematically present lower predictive performance than deep learning methods. They generally disregard the data sequence aspect entangled with multivariate data represented in more than one time series. Conversely, this work presents a novel neural network architecture for time-series forecasting that combines the power of graph evolution with deep recurrent learning on distinct data distributions; we named our method Recurrent Graph Evolution Neural Network (ReGENN). The idea is to infer multiple multivariate relationships between co-occurring time-series by assuming that the temporal data depends not only on inner variables and intra-temporal relationships (i.e., observations from itself) but also on outer variables and inter-temporal relationships (i.e., observations from other-selves). An extensive set of experiments was conducted comparing ReGENN with dozens of ensemble methods and classical statistical ones, showing sound improvement of up to 64.87% over the competing algorithms. Furthermore, we present an analysis of the intermediate weights arising from ReGENN, showing that by looking at inter and intra-temporal relationships simultaneously, time-series forecasting is majorly improved if paying attention to how multiple multivariate data synchronously evolve.

Multi-Label Zero-Shot Product Attribute-Value Extraction

E-commerce platforms should provide detailed product descriptions (attribute values) for effective product search and recommendation. However, attribute value information is typically not available for new products. To predict unseen attribute values, large quantities of labeled training data are needed to train a traditional supervised learning model. Typically, it is difficult, time-consuming, and costly to manually label large quantities of new product profiles. In this paper, we propose a novel method to efficiently and effectively extract unseen attribute values from new products in the absence of labeled data (zero-shot setting). We propose HyperPAVE, a multi-label zero-shot attribute value extraction model that leverages inductive inference in heterogeneous hypergraphs. In particular, our proposed technique constructs heterogeneous hypergraphs to capture complex higher-order relations (i.e. user behavior information) to learn more accurate feature representations for graph nodes. Furthermore, our proposed HyperPAVE model uses an inductive link prediction mechanism to infer future connections between unseen nodes. This enables HyperPAVE to identify new attribute values without the need for labeled training data. We conduct extensive experiments with ablation studies on different categories of the MAVE dataset. The results demonstrate that our proposed HyperPAVE model significantly outperforms existing classification-based, generation-based large language models for attribute value extraction in the zero-shot setting.

Multi-Temporal Relationship Inference in Urban Areas

Finding multiple temporal relationships among locations can benefit a bunch of urban applications, such as dynamic offline advertising and smart public transport planning. While some efforts have been made on finding static relationships among locations, little attention is focused on studying time-aware location relationships. Indeed, abundant location-based human activities are time-varying and the availability of these data enables a new paradigm for understanding the dynamic relationships in a period among connective locations. To this end, we propose to study a new problem, namely multi-Temporal relationship inference among locations (Trial for short), where the major challenge is how to integrate dynamic and geographical influence under the relationship sparsity constraint. Specifically, we propose a solution to Trial with a graph learning scheme, which includes a spatially evolving graph neural network (SEENet) with two collaborative components: spatially evolving graph convolution module (SEConv) and spatially evolving self-supervised learning strategy (SE-SSL). SEConv performs the intra-time aggregation and inter-time propagation to capture the multifaceted spatially evolving contexts from the view of location message passing. In addition, SE-SSL designs time-aware self-supervised learning tasks in a global-local manner with additional evolving constraint to enhance the location representation learning and further handle the relationship sparsity. Finally, experiments on four real-world datasets demonstrate the superiority of our method over several state-of-the-art approaches.

From Graphs to Hypergraphs: Hypergraph Projection and its Remediation

We study the implications of the modeling choice to use a graph, instead of a hypergraph, to represent real-world interconnected systems whose constituent relationships are of higher order by nature. Such a modeling choice typically involves an underlying projection process that maps the original hypergraph onto a graph, and is common in graph-based analysis. While hypergraph projection can potentially lead to loss of higher-order relations, there exists very limited studies on the consequences of doing so, as well as its remediation. This work fills this gap by doing two things: (1) we develop analysis based on graph and set theory, showing two ubiquitous patterns of hyperedges that are root to structural information loss in all hypergraph projections; we also quantify the combinatorial impossibility of recovering the lost higher-order structures if no extra help is provided; (2) we still seek to recover the lost higher-order structures in hypergraph projection, and in light of (1)'s findings we propose to relax the problem into a learning-based setting. Under this setting, we develop a learning-based hypergraph reconstruction method based on an important statistic of hyperedge distributions that we find. Our reconstruction method is evaluated on 8 real-world datasets under different settings, and exhibits consistently good performance. We also demonstrate benefits of the reconstructed hypergraphs via use cases of protein rankings and link predictions.

Linguistic and Structural Basis of Engineering Design Knowledge

Artefact descriptions are the primary carriers of engineering design knowledge that is both an outcome and a driver of the design process. While an artefact could be described in different connotations, the design process requires a description to embody engineering design knowledge, which is expressed in the text through intricate placement of entities and relationships. As large-language models learn from all kinds of text merely as a sequence of characters/tokens, these are yet to generate text that embodies explicit engineering design facts. Existing ontological design theories are less likely to guide the large-language models whose applications are currently limited to ideation and learning purposes. In this article, we explicate engineering design knowledge as knowledge graphs from a large sample of 33,881 patent documents. We examine the constituents of these knowledge graphs to understand the linguistic and structural basis of engineering design knowledge. In terms of linguistic basis, we observe that entities and relationships could be generalised to 64 and 24 linguistic syntaxes. While relationships mainly capture attributes ('of'), structure ('in', 'with'), purpose ('to', 'for'), hierarchy ('include'), exemplification ('such as'), and behaviour ('to', 'from'), the hierarchical relationships could specifically be identified using 75 unique syntaxes. To understand the structural basis, we draw inspiration from various studies on biological/ecological networks and discover motifs from patent knowledge graphs. We identify four 3-node and four 4-node patterns that could further be converged and simplified into sequence [->...->], aggregation [->...<-], and hierarchy [<-...->]. Expected to guide large-language model based design tools, we propose few regulatory precepts for concretising abstract entities and relationships within subgraphs, while explicating hierarchical structures.

RelationBooth: Towards Relation-Aware Customized Object Generation

Customized image generation is crucial for delivering personalized content based on user-provided image prompts, aligning large-scale text-to-image diffusion models with individual needs. However, existing models often overlook the relationships between customized objects in generated images. Instead, this work addresses that gap by focusing on relation-aware customized image generation, which aims to preserve the identities from image prompts while maintaining the predicate relations described in text prompts. Specifically, we introduce RelationBooth, a framework that disentangles identity and relation learning through a well-curated dataset. Our training data consists of relation-specific images, independent object images containing identity information, and text prompts to guide relation generation. Then, we propose two key modules to tackle the two main challenges: generating accurate and natural relations, especially when significant pose adjustments are required, and avoiding object confusion in cases of overlap. First, we introduce a keypoint matching loss that effectively guides the model in adjusting object poses closely tied to their relationships. Second, we incorporate local features from the image prompts to better distinguish between objects, preventing confusion in overlapping cases. Extensive results on three benchmarks demonstrate the superiority of RelationBooth in generating precise relations while preserving object identities across a diverse set of objects and relations. The source code and trained models will be made available to the public.

Heterogeneous Graph Representation Learning with Relation Awareness

Representation learning on heterogeneous graphs aims to obtain meaningful node representations to facilitate various downstream tasks, such as node classification and link prediction. Existing heterogeneous graph learning methods are primarily developed by following the propagation mechanism of node representations. There are few efforts on studying the role of relations for improving the learning of more fine-grained node representations. Indeed, it is important to collaboratively learn the semantic representations of relations and discern node representations with respect to different relation types. To this end, in this paper, we propose a novel Relation-aware Heterogeneous Graph Neural Network, namely R-HGNN, to learn node representations on heterogeneous graphs at a fine-grained level by considering relation-aware characteristics. Specifically, a dedicated graph convolution component is first designed to learn unique node representations from each relation-specific graph separately. Then, a cross-relation message passing module is developed to improve the interactions of node representations across different relations. Also, the relation representations are learned in a layer-wise manner to capture relation semantics, which are used to guide the node representation learning process. Moreover, a semantic fusing module is presented to aggregate relation-aware node representations into a compact representation with the learned relation representations. Finally, we conduct extensive experiments on a variety of graph learning tasks, and experimental results demonstrate that our approach consistently outperforms existing methods among all the tasks.

From Cities to Series: Complex Networks and Deep Learning for Improved Spatial and Temporal Analytics*

Graphs have often been used to answer questions about the interaction between real-world entities by taking advantage of their capacity to represent complex topologies. Complex networks are known to be graphs that capture such non-trivial topologies; they are able to represent human phenomena such as epidemic processes, the dynamics of populations, and the urbanization of cities. The investigation of complex networks has been extrapolated to many fields of science, with particular emphasis on computing techniques, including artificial intelligence. In such a case, the analysis of the interaction between entities of interest is transposed to the internal learning of algorithms, a paradigm whose investigation is able to expand the state of the art in Computer Science. By exploring this paradigm, this thesis puts together complex networks and machine learning techniques to improve the understanding of the human phenomena observed in pandemics, pendular migration, and street networks. Accordingly, we contribute with: (i) a new neural network architecture capable of modeling dynamic processes observed in spatial and temporal data with applications in epidemics propagation, weather forecasting, and patient monitoring in intensive care units; (ii) a machine-learning methodology for analyzing and predicting links in the scope of human mobility between all the cities of Brazil; and, (iii) techniques for identifying inconsistencies in the urban planning of cities while tracking the most influential vertices, with applications over Brazilian and worldwide cities. We obtained results sustained by sound evidence of advances to the state of the art in artificial intelligence, rigorous formalisms, and ample experimentation. Our findings rely upon real-world applications in a range of domains, demonstrating the applicability of our methodologies.

Relation Rectification in Diffusion Model

Despite their exceptional generative abilities, large text-to-image diffusion models, much like skilled but careless artists, often struggle with accurately depicting visual relationships between objects. This issue, as we uncover through careful analysis, arises from a misaligned text encoder that struggles to interpret specific relationships and differentiate the logical order of associated objects. To resolve this, we introduce a novel task termed Relation Rectification, aiming to refine the model to accurately represent a given relationship it initially fails to generate. To address this, we propose an innovative solution utilizing a Heterogeneous Graph Convolutional Network (HGCN). It models the directional relationships between relation terms and corresponding objects within the input prompts. Specifically, we optimize the HGCN on a pair of prompts with identical relational words but reversed object orders, supplemented by a few reference images. The lightweight HGCN adjusts the text embeddings generated by the text encoder, ensuring the accurate reflection of the textual relation in the embedding space. Crucially, our method retains the parameters of the text encoder and diffusion model, preserving the model's robust performance on unrelated descriptions. We validated our approach on a newly curated dataset of diverse relational data, demonstrating both quantitative and qualitative enhancements in generating images with precise visual relations. Project page: https://wuyinwei-hah.github.io/rrnet.github.io/.

Relational Deep Learning: Graph Representation Learning on Relational Databases

Much of the world's most valued data is stored in relational databases and data warehouses, where the data is organized into many tables connected by primary-foreign key relations. However, building machine learning models using this data is both challenging and time consuming. The core problem is that no machine learning method is capable of learning on multiple tables interconnected by primary-foreign key relations. Current methods can only learn from a single table, so the data must first be manually joined and aggregated into a single training table, the process known as feature engineering. Feature engineering is slow, error prone and leads to suboptimal models. Here we introduce an end-to-end deep representation learning approach to directly learn on data laid out across multiple tables. We name our approach Relational Deep Learning (RDL). The core idea is to view relational databases as a temporal, heterogeneous graph, with a node for each row in each table, and edges specified by primary-foreign key links. Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all input data, without any manual feature engineering. Relational Deep Learning leads to more accurate models that can be built much faster. To facilitate research in this area, we develop RelBench, a set of benchmark datasets and an implementation of Relational Deep Learning. The data covers a wide spectrum, from discussions on Stack Exchange to book reviews on the Amazon Product Catalog. Overall, we define a new research area that generalizes graph machine learning and broadens its applicability to a wide set of AI use cases.

Augmenting Textual Generation via Topology Aware Retrieval

Despite the impressive advancements of Large Language Models (LLMs) in generating text, they are often limited by the knowledge contained in the input and prone to producing inaccurate or hallucinated content. To tackle these issues, Retrieval-augmented Generation (RAG) is employed as an effective strategy to enhance the available knowledge base and anchor the responses in reality by pulling additional texts from external databases. In real-world applications, texts are often linked through entities within a graph, such as citations in academic papers or comments in social networks. This paper exploits these topological relationships to guide the retrieval process in RAG. Specifically, we explore two kinds of topological connections: proximity-based, focusing on closely connected nodes, and role-based, which looks at nodes sharing similar subgraph structures. Our empirical research confirms their relevance to text relationships, leading us to develop a Topology-aware Retrieval-augmented Generation framework. This framework includes a retrieval module that selects texts based on their topological relationships and an aggregation module that integrates these texts into prompts to stimulate LLMs for text generation. We have curated established text-attributed networks and conducted comprehensive experiments to validate the effectiveness of this framework, demonstrating its potential to enhance RAG with topological awareness.

Tackling the Challenges in Scene Graph Generation with Local-to-Global Interactions

In this work, we seek new insights into the underlying challenges of the Scene Graph Generation (SGG) task. Quantitative and qualitative analysis of the Visual Genome dataset implies -- 1) Ambiguity: even if inter-object relationship contains the same object (or predicate), they may not be visually or semantically similar, 2) Asymmetry: despite the nature of the relationship that embodied the direction, it was not well addressed in previous studies, and 3) Higher-order contexts: leveraging the identities of certain graph elements can help to generate accurate scene graphs. Motivated by the analysis, we design a novel SGG framework, Local-to-Global Interaction Networks (LOGIN). Locally, interactions extract the essence between three instances of subject, object, and background, while baking direction awareness into the network by explicitly constraining the input order of subject and object. Globally, interactions encode the contexts between every graph component (i.e., nodes and edges). Finally, Attract & Repel loss is utilized to fine-tune the distribution of predicate embeddings. By design, our framework enables predicting the scene graph in a bottom-up manner, leveraging the possible complementariness. To quantify how much LOGIN is aware of relational direction, a new diagnostic task called Bidirectional Relationship Classification (BRC) is also proposed. Experimental results demonstrate that LOGIN can successfully distinguish relational direction than existing methods (in BRC task), while showing state-of-the-art results on the Visual Genome benchmark (in SGG task).

Valentine: Evaluating Matching Techniques for Dataset Discovery

Data scientists today search large data lakes to discover and integrate datasets. In order to bring together disparate data sources, dataset discovery methods rely on some form of schema matching: the process of establishing correspondences between datasets. Traditionally, schema matching has been used to find matching pairs of columns between a source and a target schema. However, the use of schema matching in dataset discovery methods differs from its original use. Nowadays schema matching serves as a building block for indicating and ranking inter-dataset relationships. Surprisingly, although a discovery method's success relies highly on the quality of the underlying matching algorithms, the latest discovery methods employ existing schema matching algorithms in an ad-hoc fashion due to the lack of openly-available datasets with ground truth, reference method implementations, and evaluation metrics. In this paper, we aim to rectify the problem of evaluating the effectiveness and efficiency of schema matching methods for the specific needs of dataset discovery. To this end, we propose Valentine, an extensible open-source experiment suite to execute and organize large-scale automated matching experiments on tabular data. Valentine includes implementations of seminal schema matching methods that we either implemented from scratch (due to absence of open source code) or imported from open repositories. The contributions of Valentine are: i) the definition of four schema matching scenarios as encountered in dataset discovery methods, ii) a principled dataset fabrication process tailored to the scope of dataset discovery methods and iii) the most comprehensive evaluation of schema matching techniques to date, offering insight on the strengths and weaknesses of existing techniques, that can serve as a guide for employing schema matching in future dataset discovery methods.

Breaking the Entanglement of Homophily and Heterophily in Semi-supervised Node Classification

Recently, graph neural networks (GNNs) have shown prominent performance in semi-supervised node classification by leveraging knowledge from the graph database. However, most existing GNNs follow the homophily assumption, where connected nodes are more likely to exhibit similar feature distributions and the same labels, and such an assumption has proven to be vulnerable in a growing number of practical applications. As a supplement, heterophily reflects dissimilarity in connected nodes, which has gained significant attention in graph learning. To this end, data engineers aim to develop a powerful GNN model that can ensure performance under both homophily and heterophily. Despite numerous attempts, most existing GNNs struggle to achieve optimal node representations due to the constraints of undirected graphs. The neglect of directed edges results in sub-optimal graph representations, thereby hindering the capacity of GNNs. To address this issue, we introduce AMUD, which quantifies the relationship between node profiles and topology from a statistical perspective, offering valuable insights for Adaptively Modeling the natural directed graphs as the Undirected or Directed graph to maximize the benefits from subsequent graph learning. Furthermore, we propose Adaptive Directed Pattern Aggregation (ADPA) as a new directed graph learning paradigm for AMUD. Empirical studies have demonstrated that AMUD guides efficient graph learning. Meanwhile, extensive experiments on 14 benchmark datasets substantiate the impressive performance of ADPA, outperforming baselines by significant margins of 3.96\%.

Temporal Graph Analysis with TGX

Real-world networks, with their evolving relations, are best captured as temporal graphs. However, existing software libraries are largely designed for static graphs where the dynamic nature of temporal graphs is ignored. Bridging this gap, we introduce TGX, a Python package specially designed for analysis of temporal networks that encompasses an automated pipeline for data loading, data processing, and analysis of evolving graphs. TGX provides access to eleven built-in datasets and eight external Temporal Graph Benchmark (TGB) datasets as well as any novel datasets in the .csv format. Beyond data loading, TGX facilitates data processing functionalities such as discretization of temporal graphs and node subsampling to accelerate working with larger datasets. For comprehensive investigation, TGX offers network analysis by providing a diverse set of measures, including average node degree and the evolving number of nodes and edges per timestamp. Additionally, the package consolidates meaningful visualization plots indicating the evolution of temporal patterns, such as Temporal Edge Appearance (TEA) and Temporal Edge Trafficc (TET) plots. The TGX package is a robust tool for examining the features of temporal graphs and can be used in various areas like studying social networks, citation networks, and tracking user interactions. We plan to continuously support and update TGX based on community feedback. TGX is publicly available on: https://github.com/ComplexData-MILA/TGX.

Variationally Regularized Graph-based Representation Learning for Electronic Health Records

Electronic Health Records (EHR) are high-dimensional data with implicit connections among thousands of medical concepts. These connections, for instance, the co-occurrence of diseases and lab-disease correlations can be informative when only a subset of these variables is documented by the clinician. A feasible approach to improving the representation learning of EHR data is to associate relevant medical concepts and utilize these connections. Existing medical ontologies can be the reference for EHR structures, but they place numerous constraints on the data source. Recent progress on graph neural networks (GNN) enables end-to-end learning of topological structures for non-grid or non-sequential data. However, there are problems to be addressed on how to learn the medical graph adaptively and how to understand the effect of the medical graph on representation learning. In this paper, we propose a variationally regularized encoder-decoder graph network that achieves more robustness in graph structure learning by regularizing node representations. Our model outperforms the existing graph and non-graph based methods in various EHR predictive tasks based on both public data and real-world clinical data. Besides the improvements in empirical experiment performances, we provide an interpretation of the effect of variational regularization compared to standard graph neural network, using singular value analysis.

Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation

Scene graph generation (SGG) aims to predict graph-structured descriptions of input images, in the form of objects and relationships between them. This task is becoming increasingly useful for progress at the interface of vision and language. Here, it is important - yet challenging - to perform well on novel (zero-shot) or rare (few-shot) compositions of objects and relationships. In this paper, we identify two key issues that limit such generalization. Firstly, we show that the standard loss used in this task is unintentionally a function of scene graph density. This leads to the neglect of individual edges in large sparse graphs during training, even though these contain diverse few-shot examples that are important for generalization. Secondly, the frequency of relationships can create a strong bias in this task, such that a blind model predicting the most frequent relationship achieves good performance. Consequently, some state-of-the-art models exploit this bias to improve results. We show that such models can suffer the most in their ability to generalize to rare compositions, evaluating two different models on the Visual Genome dataset and its more recent, improved version, GQA. To address these issues, we introduce a density-normalized edge loss, which provides more than a two-fold improvement in certain generalization metrics. Compared to other works in this direction, our enhancements require only a few lines of code and no added computational cost. We also highlight the difficulty of accurately evaluating models using existing metrics, especially on zero/few shots, and introduce a novel weighted metric.

Partial Correlations in Compositional Data Analysis

Partial correlations quantify linear association between two variables adjusting for the influence of the remaining variables. They form the backbone for graphical models and are readily obtained from the inverse of the covariance matrix. For compositional data, the covariance structure is specified from log ratios of variables, so unless we try to "open" the data via a normalization, this implies changes in the definition and interpretation of partial correlations. In the present work, we elucidate how results derived by Aitchison (1986) lead to a natural definition of partial correlation that has a number of advantages over current measures of association. For this, we show that the residuals of log-ratios between a variable with a reference, when adjusting for all remaining variables including the reference, are reference-independent. Since the reference itself can be controlled for, correlations between residuals are defined for the variables directly without the necessity to recur to ratios except when specifying which variables are partialled out. Thus, perhaps surprisingly, partial correlations do not have the problems commonly found with measures of pairwise association on compositional data. They are well-defined between two variables, are properly scaled, and allow for negative association. By design, they are subcompositionally incoherent, but they share this property with conventional partial correlations (where results change when adjusting for the influence of fewer variables). We discuss the equivalence with normalization-based approaches whenever the normalizing variables are controlled for. We also discuss the partial variances and correlations we obtain from a previously studied data set of Roman glass cups.

Learning to Match Jobs with Resumes from Sparse Interaction Data using Multi-View Co-Teaching Network

With the ever-increasing growth of online recruitment data, job-resume matching has become an important task to automatically match jobs with suitable resumes. This task is typically casted as a supervised text matching problem. Supervised learning is powerful when the labeled data is sufficient. However, on online recruitment platforms, job-resume interaction data is sparse and noisy, which affects the performance of job-resume match algorithms. To alleviate these problems, in this paper, we propose a novel multi-view co-teaching network from sparse interaction data for job-resume matching. Our network consists of two major components, namely text-based matching model and relation-based matching model. The two parts capture semantic compatibility in two different views, and complement each other. In order to address the challenges from sparse and noisy data, we design two specific strategies to combine the two components. First, two components share the learned parameters or representations, so that the original representations of each component can be enhanced. More importantly, we adopt a co-teaching mechanism to reduce the influence of noise in training data. The core idea is to let the two components help each other by selecting more reliable training instances. The two strategies focus on representation enhancement and data enhancement, respectively. Compared with pure text-based matching models, the proposed approach is able to learn better data representations from limited or even sparse interaction data, which is more resistible to noise in training data. Experiment results have demonstrated that our model is able to outperform state-of-the-art methods for job-resume matching.

Graphlets correct for the topological information missed by random walks

Random walks are widely used for mining networks due to the computational efficiency of computing them. For instance, graph representation learning learns a d-dimensional embedding space, so that the nodes that tend to co-occur on random walks (a proxy of being in the same network neighborhood) are close in the embedding space. Specific local network topology (i.e., structure) influences the co-occurrence of nodes on random walks, so random walks of limited length capture only partial topological information, hence diminishing the performance of downstream methods. We explicitly capture all topological neighborhood information and improve performance by introducing orbit adjacencies that quantify the adjacencies of two nodes as co-occurring on a given pair of graphlet orbits, which are symmetric positions on graphlets (small, connected, non-isomorphic, induced subgraphs of a large network). Importantly, we mathematically prove that random walks on up to k nodes capture only a subset of all the possible orbit adjacencies for up to k-node graphlets. Furthermore, we enable orbit adjacency-based analysis of networks by developing an efficient GRaphlet-orbit ADjacency COunter (GRADCO), which exhaustively computes all 28 orbit adjacency matrices for up to four-node graphlets. Note that four-node graphlets suffice, because real networks are usually small-world. In large networks on around 20,000 nodes, GRADCOcomputesthe28matricesinminutes. Onsixrealnetworksfromvarious domains, we compare the performance of node-label predictors obtained by using the network embeddings based on our orbit adjacencies to those based on random walks. We find that orbit adjacencies, which include those unseen by random walks, outperform random walk-based adjacencies, demonstrating the importance of the inclusion of the topological neighborhood information that is unseen by random walks.

Building Variable-sized Models via Learngene Pool

Recently, Stitchable Neural Networks (SN-Net) is proposed to stitch some pre-trained networks for quickly building numerous networks with different complexity and performance trade-offs. In this way, the burdens of designing or training the variable-sized networks, which can be used in application scenarios with diverse resource constraints, are alleviated. However, SN-Net still faces a few challenges. 1) Stitching from multiple independently pre-trained anchors introduces high storage resource consumption. 2) SN-Net faces challenges to build smaller models for low resource constraints. 3). SN-Net uses an unlearned initialization method for stitch layers, limiting the final performance. To overcome these challenges, motivated by the recently proposed Learngene framework, we propose a novel method called Learngene Pool. Briefly, Learngene distills the critical knowledge from a large pre-trained model into a small part (termed as learngene) and then expands this small part into a few variable-sized models. In our proposed method, we distill one pretrained large model into multiple small models whose network blocks are used as learngene instances to construct the learngene pool. Since only one large model is used, we do not need to store more large models as SN-Net and after distilling, smaller learngene instances can be created to build small models to satisfy low resource constraints. We also insert learnable transformation matrices between the instances to stitch them into variable-sized models to improve the performance of these models. Exhaustive experiments have been implemented and the results validate the effectiveness of the proposed Learngene Pool compared with SN-Net.

Neural Production Systems: Learning Rule-Governed Visual Dynamics

Visual environments are structured, consisting of distinct objects or entities. These entities have properties -- both visible and latent -- that determine the manner in which they interact with one another. To partition images into entities, deep-learning researchers have proposed structural inductive biases such as slot-based architectures. To model interactions among entities, equivariant graph neural nets (GNNs) are used, but these are not particularly well suited to the task for two reasons. First, GNNs do not predispose interactions to be sparse, as relationships among independent entities are likely to be. Second, GNNs do not factorize knowledge about interactions in an entity-conditional manner. As an alternative, we take inspiration from cognitive science and resurrect a classic approach, production systems, which consist of a set of rule templates that are applied by binding placeholder variables in the rules to specific entities. Rules are scored on their match to entities, and the best fitting rules are applied to update entity properties. In a series of experiments, we demonstrate that this architecture achieves a flexible, dynamic flow of control and serves to factorize entity-specific and rule-based information. This disentangling of knowledge achieves robust future-state prediction in rich visual environments, outperforming state-of-the-art methods using GNNs, and allows for the extrapolation from simple (few object) environments to more complex environments.

Landscaping Linear Mode Connectivity

The presence of linear paths in parameter space between two different network solutions in certain cases, i.e., linear mode connectivity (LMC), has garnered interest from both theoretical and practical fronts. There has been significant research that either practically designs algorithms catered for connecting networks by adjusting for the permutation symmetries as well as some others that more theoretically construct paths through which networks can be connected. Yet, the core reasons for the occurrence of LMC, when in fact it does occur, in the highly non-convex loss landscapes of neural networks are far from clear. In this work, we take a step towards understanding it by providing a model of how the loss landscape needs to behave topographically for LMC (or the lack thereof) to manifest. Concretely, we present a `mountainside and ridge' perspective that helps to neatly tie together different geometric features that can be spotted in the loss landscape along the training runs. We also complement this perspective by providing a theoretical analysis of the barrier height, for which we provide empirical support, and which additionally extends as a faithful predictor of layer-wise LMC. We close with a toy example that provides further intuition on how barriers arise in the first place, all in all, showcasing the larger aim of the work -- to provide a working model of the landscape and its topography for the occurrence of LMC.

Linking Datasets on Organizations Using Half A Billion Open Collaborated Records

Scholars studying organizations often work with multiple datasets lacking shared unique identifiers or covariates. In such situations, researchers may turn to approximate string matching methods to combine datasets. String matching, although useful, faces fundamental challenges. Even when two strings appear similar to humans, fuzzy matching often does not work because it fails to adapt to the informativeness of the character combinations presented. Worse, many entities have multiple names that are dissimilar (e.g., "Fannie Mae" and "Federal National Mortgage Association"), a case where string matching has little hope of succeeding. This paper introduces data from a prominent employment-related networking site (LinkedIn) as a tool to address these problems. We propose interconnected approaches to leveraging the massive amount of information from LinkedIn regarding organizational name-to-name links. The first approach builds a machine learning model for predicting matches from character strings, treating the trillions of user-contributed organizational name pairs as a training corpus: this approach constructs a string matching metric that explicitly maximizes match probabilities. A second approach identifies relationships between organization names using network representations of the LinkedIn data. A third approach combines the first and second. We document substantial improvements over fuzzy matching in applications, making all methods accessible in open-source software ("LinkOrgs").

Higher-order Graph Convolutional Network with Flower-Petals Laplacians on Simplicial Complexes

Despite the recent successes of vanilla Graph Neural Networks (GNNs) on many tasks, their foundation on pairwise interaction networks inherently limits their capacity to discern latent higher-order interactions in complex systems. To bridge this capability gap, we propose a novel approach exploiting the rich mathematical theory of simplicial complexes (SCs) - a robust tool for modeling higher-order interactions. Current SC-based GNNs are burdened by high complexity and rigidity, and quantifying higher-order interaction strengths remains challenging. Innovatively, we present a higher-order Flower-Petals (FP) model, incorporating FP Laplacians into SCs. Further, we introduce a Higher-order Graph Convolutional Network (HiGCN) grounded in FP Laplacians, capable of discerning intrinsic features across varying topological scales. By employing learnable graph filters, a parameter group within each FP Laplacian domain, we can identify diverse patterns where the filters' weights serve as a quantifiable measure of higher-order interaction strengths. The theoretical underpinnings of HiGCN's advanced expressiveness are rigorously demonstrated. Additionally, our empirical investigations reveal that the proposed model accomplishes state-of-the-art (SOTA) performance on a range of graph tasks and provides a scalable and flexible solution to explore higher-order interactions in graphs.

Representation Learning in Continuous-Time Dynamic Signed Networks

Signed networks allow us to model conflicting relationships and interactions, such as friend/enemy and support/oppose. These signed interactions happen in real-time. Modeling such dynamics of signed networks is crucial to understanding the evolution of polarization in the network and enabling effective prediction of the signed structure (i.e., link signs and signed weights) in the future. However, existing works have modeled either (static) signed networks or dynamic (unsigned) networks but not dynamic signed networks. Since both sign and dynamics inform the graph structure in different ways, it is non-trivial to model how to combine the two features. In this work, we propose a new Graph Neural Network (GNN)-based approach to model dynamic signed networks, named SEMBA: Signed link's Evolution using Memory modules and Balanced Aggregation. Here, the idea is to incorporate the signs of temporal interactions using separate modules guided by balance theory and to evolve the embeddings from a higher-order neighborhood. Experiments on 4 real-world datasets and 4 different tasks demonstrate that SEMBA consistently and significantly outperforms the baselines by up to 80% on the tasks of predicting signs of future links while matching the state-of-the-art performance on predicting the existence of these links in the future. We find that this improvement is due specifically to the superior performance of SEMBA on the minority negative class.

TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting

Image inpainting for completing complicated semantic environments and diverse hole patterns of corrupted images is challenging even for state-of-the-art learning-based inpainting methods trained on large-scale data. A reference image capturing the same scene of a corrupted image offers informative guidance for completing the corrupted image as it shares similar texture and structure priors to that of the holes of the corrupted image. In this work, we propose a transformer-based encoder-decoder network, named TransRef, for reference-guided image inpainting. Specifically, the guidance is conducted progressively through a reference embedding procedure, in which the referencing features are subsequently aligned and fused with the features of the corrupted image. For precise utilization of the reference features for guidance, a reference-patch alignment (Ref-PA) module is proposed to align the patch features of the reference and corrupted images and harmonize their style differences, while a reference-patch transformer (Ref-PT) module is proposed to refine the embedded reference feature. Moreover, to facilitate the research of reference-guided image restoration tasks, we construct a publicly accessible benchmark dataset containing 50K pairs of input and reference images. Both quantitative and qualitative evaluations demonstrate the efficacy of the reference information and the proposed method over the state-of-the-art methods in completing complex holes. Code and dataset can be accessed at https://github.com/Cameltr/TransRef.

A Named Entity Based Approach to Model Recipes

Traditional cooking recipes follow a structure which can be modelled very well if the rules and semantics of the different sections of the recipe text are analyzed and represented accurately. We propose a structure that can accurately represent the recipe as well as a pipeline to infer the best representation of the recipe in this uniform structure. The Ingredients section in a recipe typically lists down the ingredients required and corresponding attributes such as quantity, temperature, and processing state. This can be modelled by defining these attributes and their values. The physical entities which make up a recipe can be broadly classified into utensils, ingredients and their combinations that are related by cooking techniques. The instruction section lists down a series of events in which a cooking technique or process is applied upon these utensils and ingredients. We model these relationships in the form of tuples. Thus, using a combination of these methods we model cooking recipe in the dataset RecipeDB to show the efficacy of our method. This mined information model can have several applications which include translating recipes between languages, determining similarity between recipes, generation of novel recipes and estimation of the nutritional profile of recipes. For the purpose of recognition of ingredient attributes, we train the Named Entity Relationship (NER) models and analyze the inferences with the help of K-Means clustering. Our model presented with an F1 score of 0.95 across all datasets. We use a similar NER tagging model for labelling cooking techniques (F1 score = 0.88) and utensils (F1 score = 0.90) within the instructions section. Finally, we determine the temporal sequence of relationships between ingredients, utensils and cooking techniques for modeling the instruction steps.

Leveraging Invariant Principle for Heterophilic Graph Structure Distribution Shifts

Heterophilic Graph Neural Networks (HGNNs) have shown promising results for semi-supervised learning tasks on graphs. Notably, most real-world heterophilic graphs are composed of a mixture of nodes with different neighbor patterns, exhibiting local node-level homophilic and heterophilic structures. However, existing works are only devoted to designing better HGNN backbones or architectures for node classification tasks on heterophilic and homophilic graph benchmarks simultaneously, and their analyses of HGNN performance with respect to nodes are only based on the determined data distribution without exploring the effect caused by this structural difference between training and testing nodes. How to learn invariant node representations on heterophilic graphs to handle this structure difference or distribution shifts remains unexplored. In this paper, we first discuss the limitations of previous graph-based invariant learning methods from the perspective of data augmentation. Then, we propose HEI, a framework capable of generating invariant node representations through incorporating heterophily information to infer latent environments without augmentation, which are then used for invariant prediction, under heterophilic graph structure distribution shifts. We theoretically show that our proposed method can achieve guaranteed performance under heterophilic graph structure distribution shifts. Extensive experiments on various benchmarks and backbones can also demonstrate the effectiveness of our method compared with existing state-of-the-art baselines.

Conditional Graph Information Bottleneck for Molecular Relational Learning

Molecular relational learning, whose goal is to learn the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications. Recently, graph neural networks have recently shown great success in molecular relational learning by modeling a molecule as a graph structure, and considering atom-level interactions between two molecules. Despite their success, existing molecular relational learning methods tend to overlook the nature of chemistry, i.e., a chemical compound is composed of multiple substructures such as functional groups that cause distinctive chemical reactions. In this work, we propose a novel relational learning framework, called CGIB, that predicts the interaction behavior between a pair of graphs by detecting core subgraphs therein. The main idea is, given a pair of graphs, to find a subgraph from a graph that contains the minimal sufficient information regarding the task at hand conditioned on the paired graph based on the principle of conditional graph information bottleneck. We argue that our proposed method mimics the nature of chemical reactions, i.e., the core substructure of a molecule varies depending on which other molecule it interacts with. Extensive experiments on various tasks with real-world datasets demonstrate the superiority of CGIB over state-of-the-art baselines. Our code is available at https://github.com/Namkyeong/CGIB.

Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks

We present an agentic, autonomous graph expansion framework that iteratively structures and refines knowledge in situ. Unlike conventional knowledge graph construction methods relying on static extraction or single-pass learning, our approach couples a reasoning-native large language model with a continually updated graph representation. At each step, the system actively generates new concepts and relationships, merges them into a global graph, and formulates subsequent prompts based on its evolving structure. Through this feedback-driven loop, the model organizes information into a scale-free network characterized by hub formation, stable modularity, and bridging nodes that link disparate knowledge clusters. Over hundreds of iterations, new nodes and edges continue to appear without saturating, while centrality measures and shortest path distributions evolve to yield increasingly distributed connectivity. Our analysis reveals emergent patterns, such as the rise of highly connected 'hub' concepts and the shifting influence of 'bridge' nodes, indicating that agentic, self-reinforcing graph construction can yield open-ended, coherent knowledge structures. Applied to materials design problems, we present compositional reasoning experiments by extracting node-specific and synergy-level principles to foster genuinely novel knowledge synthesis, yielding cross-domain ideas that transcend rote summarization and strengthen the framework's potential for open-ended scientific discovery. We discuss other applications in scientific discovery and outline future directions for enhancing scalability and interpretability.

ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science

Large language models record impressive performance on many natural language processing tasks. However, their knowledge capacity is limited to the pretraining corpus. Retrieval augmentation offers an effective solution by retrieving context from external knowledge sources to complement the language model. However, existing retrieval augmentation techniques ignore the structural relationships between these documents. Furthermore, retrieval models are not explored much in scientific tasks, especially in regard to the faithfulness of retrieved documents. In this paper, we propose a novel structure-aware retrieval augmented language model that accommodates document structure during retrieval augmentation. We create a heterogeneous document graph capturing multiple types of relationships (e.g., citation, co-authorship, etc.) that connect documents from more than 15 scientific disciplines (e.g., Physics, Medicine, Chemistry, etc.). We train a graph neural network on the curated document graph to act as a structural encoder for the corresponding passages retrieved during the model pretraining. Particularly, along with text embeddings of the retrieved passages, we obtain structural embeddings of the documents (passages) and fuse them together before feeding them to the language model. We evaluate our model extensively on various scientific benchmarks that include science question-answering and scientific document classification tasks. Experimental results demonstrate that structure-aware retrieval improves retrieving more coherent, faithful and contextually relevant passages, while showing a comparable performance in the overall accuracy.

Fast and Accurate Network Embeddings via Very Sparse Random Projection

We present FastRP, a scalable and performant algorithm for learning distributed node representations in a graph. FastRP is over 4,000 times faster than state-of-the-art methods such as DeepWalk and node2vec, while achieving comparable or even better performance as evaluated on several real-world networks on various downstream tasks. We observe that most network embedding methods consist of two components: construct a node similarity matrix and then apply dimension reduction techniques to this matrix. We show that the success of these methods should be attributed to the proper construction of this similarity matrix, rather than the dimension reduction method employed. FastRP is proposed as a scalable algorithm for network embeddings. Two key features of FastRP are: 1) it explicitly constructs a node similarity matrix that captures transitive relationships in a graph and normalizes matrix entries based on node degrees; 2) it utilizes very sparse random projection, which is a scalable optimization-free method for dimension reduction. An extra benefit from combining these two design choices is that it allows the iterative computation of node embeddings so that the similarity matrix need not be explicitly constructed, which further speeds up FastRP. FastRP is also advantageous for its ease of implementation, parallelization and hyperparameter tuning. The source code is available at https://github.com/GTmac/FastRP.

IRWE: Inductive Random Walk for Joint Inference of Identity and Position Network Embedding

Network embedding, which maps graphs to distributed representations, is a unified framework for various graph inference tasks. According to the topology properties (e.g., structural roles and community memberships of nodes) to be preserved, it can be categorized into the identity and position embedding. However, existing methods can only capture one type of property. Some approaches can support the inductive inference that generalizes the embedding model to new nodes or graphs but relies on the availability of attributes. Due to the complicated correlations between topology and attributes, it is unclear for some inductive methods which type of property they can capture. In this study, we explore a unified framework for the joint inductive inference of identity and position embeddings without attributes. An inductive random walk embedding (IRWE) method is proposed, which combines multiple attention units to handle the random walk on graph topology and simultaneously derives identity and position embeddings that are jointly optimized. In particular, we demonstrate that some random walk statistics can be informative features to characterize node identities and positions while supporting the inductive embedding inference. Experiments validate the superior performance of IRWE beyond various baselines for the transductive and inductive inference of identity and position embeddings.

RESTORE: Graph Embedding Assessment Through Reconstruction

Following the success of Word2Vec embeddings, graph embeddings (GEs) have gained substantial traction. GEs are commonly generated and evaluated extrinsically on downstream applications, but intrinsic evaluations of the original graph properties in terms of topological structure and semantic information have been lacking. Understanding these will help identify the deficiency of the various families of GE methods when vectorizing graphs in terms of preserving the relevant knowledge or learning incorrect knowledge. To address this, we propose RESTORE, a framework for intrinsic GEs assessment through graph reconstruction. We show that reconstructing the original graph from the underlying GEs yields insights into the relative amount of information preserved in a given vector form. We first introduce the graph reconstruction task. We generate GEs from three GE families based on factorization methods, random walks, and deep learning (with representative algorithms from each family) on the CommonSense Knowledge Graph (CSKG). We analyze their effectiveness in preserving the (a) topological structure of node-level graph reconstruction with an increasing number of hops and (b) semantic information on various word semantic and analogy tests. Our evaluations show deep learning-based GE algorithm (SDNE) is overall better at preserving (a) with a mean average precision (mAP) of 0.54 and 0.35 for 2 and 3-hop reconstruction respectively, while the factorization-based algorithm (HOPE) is better at encapsulating (b) with an average Euclidean distance of 0.14, 0.17, and 0.11 for 1, 2, and 3-hop reconstruction respectively. The modest performance of these GEs leaves room for further research avenues on better graph representation learning.

Convergent Graph Solvers

We propose the convergent graph solver (CGS), a deep learning method that learns iterative mappings to predict the properties of a graph system at its stationary state (fixed point) with guaranteed convergence. CGS systematically computes the fixed points of a target graph system and decodes them to estimate the stationary properties of the system without the prior knowledge of existing solvers or intermediate solutions. The forward propagation of CGS proceeds in three steps: (1) constructing the input dependent linear contracting iterative maps, (2) computing the fixed-points of the linear maps, and (3) decoding the fixed-points to estimate the properties. The contractivity of the constructed linear maps guarantees the existence and uniqueness of the fixed points following the Banach fixed point theorem. To train CGS efficiently, we also derive a tractable analytical expression for its gradient by leveraging the implicit function theorem. We evaluate the performance of CGS by applying it to various network-analytic and graph benchmark problems. The results indicate that CGS has competitive capabilities for predicting the stationary properties of graph systems, irrespective of whether the target systems are linear or non-linear. CGS also shows high performance for graph classification problems where the existence or the meaning of a fixed point is hard to be clearly defined, which highlights the potential of CGS as a general graph neural network architecture.

Learning multi-domain feature relation for visible and Long-wave Infrared image patch matching

Recently, learning-based algorithms have achieved promising performance on cross-spectral image patch matching, which, however, is still far from satisfactory for practical application. On the one hand, a lack of large-scale dataset with diverse scenes haunts its further improvement for learning-based algorithms, whose performances and generalization rely heavily on the dataset size and diversity. On the other hand, more emphasis has been put on feature relation in the spatial domain whereas the scale dependency between features has often been ignored, leading to performance degeneration especially when encountering significant appearance variations for cross-spectral patches. To address these issues, we publish, to be best of our knowledge, the largest visible and Long-wave Infrared (LWIR) image patch matching dataset, termed VL-CMIM, which contains 1300 pairs of strictly aligned visible and LWIR images and over 2 million patch pairs covering diverse scenes such as asteroid, field, country, build, street and water.In addition, a multi-domain feature relation learning network (MD-FRN) is proposed. Input by the features extracted from a four-branch network, both feature relations in spatial and scale domains are learned via a spatial correlation module (SCM) and multi-scale adaptive aggregation module (MSAG), respectively. To further aggregate the multi-domain relations, a deep domain interactive mechanism (DIM) is applied, where the learnt spatial-relation and scale-relation features are exchanged and further input into MSCRM and SCM. This mechanism allows our model to learn interactive cross-domain feature relations, leading to improved robustness to significant appearance changes due to different modality.

Image Synthesis with Graph Conditioning: CLIP-Guided Diffusion Models for Scene Graphs

Advancements in generative models have sparked significant interest in generating images while adhering to specific structural guidelines. Scene graph to image generation is one such task of generating images which are consistent with the given scene graph. However, the complexity of visual scenes poses a challenge in accurately aligning objects based on specified relations within the scene graph. Existing methods approach this task by first predicting a scene layout and generating images from these layouts using adversarial training. In this work, we introduce a novel approach to generate images from scene graphs which eliminates the need of predicting intermediate layouts. We leverage pre-trained text-to-image diffusion models and CLIP guidance to translate graph knowledge into images. Towards this, we first pre-train our graph encoder to align graph features with CLIP features of corresponding images using a GAN based training. Further, we fuse the graph features with CLIP embedding of object labels present in the given scene graph to create a graph consistent CLIP guided conditioning signal. In the conditioning input, object embeddings provide coarse structure of the image and graph features provide structural alignment based on relationships among objects. Finally, we fine tune a pre-trained diffusion model with the graph consistent conditioning signal with reconstruction and CLIP alignment loss. Elaborate experiments reveal that our method outperforms existing methods on standard benchmarks of COCO-stuff and Visual Genome dataset.

Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?

Given a robust model trained to be resilient to one or multiple types of distribution shifts (e.g., natural image corruptions), how is that "robustness" encoded in the model weights, and how easily can it be disentangled and/or "zero-shot" transferred to some other models? This paper empirically suggests a surprisingly simple answer: linearly - by straightforward model weight arithmetic! We start by drawing several key observations: (1)assuming that we train the same model architecture on both a clean dataset and its corrupted version, resultant weights mostly differ in shallow layers; (2)the weight difference after projection, which we call "Robust Weight Signature" (RWS), appears to be discriminative and indicative of different corruption types; (3)for the same corruption type, the RWSs obtained by one model architecture are highly consistent and transferable across different datasets. We propose a minimalistic model robustness "patching" framework that carries a model trained on clean data together with its pre-extracted RWSs. In this way, injecting certain robustness to the model is reduced to directly adding the corresponding RWS to its weight. We verify our proposed framework to be remarkably (1)lightweight. since RWSs concentrate on the shallowest few layers and we further show they can be painlessly quantized, storing an RWS is up to 13 x more compact than storing the full weight copy; (2)in-situ adjustable. RWSs can be appended as needed and later taken off to restore the intact clean model. We further demonstrate one can linearly re-scale the RWS to control the patched robustness strength; (3)composable. Multiple RWSs can be added simultaneously to patch more comprehensive robustness at once; and (4)transferable. Even when the clean model backbone is continually adapted or updated, RWSs remain as effective patches due to their outstanding cross-dataset transferability.

Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers

We present an approach to modifying Transformer architectures by integrating graph-aware relational reasoning into the attention mechanism, merging concepts from graph neural networks and language modeling. Building on the inherent connection between attention and graph theory, we reformulate the Transformer's attention mechanism as a graph operation and propose Graph-Aware Isomorphic Attention. This method leverages advanced graph modeling strategies, including Graph Isomorphism Networks (GIN) and Principal Neighborhood Aggregation (PNA), to enrich the representation of relational structures. Our approach captures complex dependencies and generalizes across tasks, as evidenced by a reduced generalization gap and improved learning performance. Additionally, we expand the concept of graph-aware attention to introduce Sparse GIN-Attention, a fine-tuning approach that employs sparse GINs. By interpreting attention matrices as sparse adjacency graphs, this technique enhances the adaptability of pre-trained foundational models with minimal computational overhead, endowing them with graph-aware capabilities. Sparse GIN-Attention fine-tuning achieves improved training dynamics and better generalization compared to alternative methods like low-rank adaption (LoRA). We discuss latent graph-like structures within traditional attention mechanisms, offering a new lens through which Transformers can be understood. By evolving Transformers as hierarchical GIN models for relational reasoning. This perspective suggests profound implications for foundational model development, enabling the design of architectures that dynamically adapt to both local and global dependencies. Applications in bioinformatics, materials science, language modeling, and beyond could benefit from this synthesis of relational and sequential data modeling, setting the stage for interpretable and generalizable modeling strategies.

Graph Deep Learning for Time Series Forecasting

Graph-based deep learning methods have become popular tools to process collections of correlated time series. Differently from traditional multivariate forecasting methods, neural graph-based predictors take advantage of pairwise relationships by conditioning forecasts on a (possibly dynamic) graph spanning the time series collection. The conditioning can take the form of an architectural inductive bias on the neural forecasting architecture, resulting in a family of deep learning models called spatiotemporal graph neural networks. Such relational inductive biases enable the training of global forecasting models on large time-series collections, while at the same time localizing predictions w.r.t. each element in the set (i.e., graph nodes) by accounting for local correlations among them (i.e., graph edges). Indeed, recent theoretical and practical advances in graph neural networks and deep learning for time series forecasting make the adoption of such processing frameworks appealing and timely. However, most of the studies in the literature focus on proposing variations of existing neural architectures by taking advantage of modern deep learning practices, while foundational and methodological aspects have not been subject to systematic investigation. To fill the gap, this paper aims to introduce a comprehensive methodological framework that formalizes the forecasting problem and provides design principles for graph-based predictive models and methods to assess their performance. At the same time, together with an overview of the field, we provide design guidelines, recommendations, and best practices, as well as an in-depth discussion of open challenges and future research directions.

Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on which model type is superior, each has unique inductive biases that shape their learning and generalization performance. For example, ViTs have interesting properties with respect to early layer non-local feature dependence, as well as self-attention mechanisms which enhance learning flexibility, enabling them to ignore out-of-context image information more effectively. We hypothesize that this power to ignore out-of-context information (which we name patch selectivity), while integrating in-context information in a non-local manner in early layers, allows ViTs to more easily handle occlusion. In this study, our aim is to see whether we can have CNNs simulate this ability of patch selectivity by effectively hardwiring this inductive bias using Patch Mixing data augmentation, which consists of inserting patches from another image onto a training image and interpolating labels between the two image classes. Specifically, we use Patch Mixing to train state-of-the-art ViTs and CNNs, assessing its impact on their ability to ignore out-of-context patches and handle natural occlusions. We find that ViTs do not improve nor degrade when trained using Patch Mixing, but CNNs acquire new capabilities to ignore out-of-context information and improve on occlusion benchmarks, leaving us to conclude that this training method is a way of simulating in CNNs the abilities that ViTs already possess. We will release our Patch Mixing implementation and proposed datasets for public use. Project page: https://arielnlee.github.io/PatchMixing/

DiffKG: Knowledge Graph Diffusion Model for Recommendation

Knowledge Graphs (KGs) have emerged as invaluable resources for enriching recommendation systems by providing a wealth of factual information and capturing semantic relationships among items. Leveraging KGs can significantly enhance recommendation performance. However, not all relations within a KG are equally relevant or beneficial for the target recommendation task. In fact, certain item-entity connections may introduce noise or lack informative value, thus potentially misleading our understanding of user preferences. To bridge this research gap, we propose a novel knowledge graph diffusion model for recommendation, referred to as DiffKG. Our framework integrates a generative diffusion model with a data augmentation paradigm, enabling robust knowledge graph representation learning. This integration facilitates a better alignment between knowledge-aware item semantics and collaborative relation modeling. Moreover, we introduce a collaborative knowledge graph convolution mechanism that incorporates collaborative signals reflecting user-item interaction patterns, guiding the knowledge graph diffusion process. We conduct extensive experiments on three publicly available datasets, consistently demonstrating the superiority of our DiffKG compared to various competitive baselines. We provide the source code repository of our proposed DiffKG model at the following link: https://github.com/HKUDS/DiffKG.