What question did this study set out to answer?

The aim is to develop a new attention-based model, Collaborative Filtering Transformer (CFT), that enhances feature extraction from graph structures while minimizing complexity.

April 30, 2026Open Access

Attention Is All We Need: Collaborative Filtering Transformer

Key Points

The aim is to develop a new attention-based model, Collaborative Filtering Transformer (CFT), that enhances feature extraction from graph structures while minimizing complexity.
Designed a new graph feature extractor using a lightweight attention mechanism without graph encoding or projection matrices.
Constructed preference sentences to introduce non-interaction information in the CFT encoder.
Performed experimental verification and theoretical analysis to validate the effectiveness of the proposed model.
CFT shows a 20.60% improvement in recommendation performance compared to the second-best baseline.
Higher attention scores are consistently assigned to nodes with interactions, while scores for non-interacting nodes approach zero.
CFT exhibits training efficiency comparable to simple matrix factorization-based methods across five datasets.

Abstract

Graph encoding and the attention mechanism enable Graph Transformers (GTs) to extract features of graph structures. However, employing graph encoding and Transformer-based attention mechanism may lead to two defects: high computing complexity and sensitivity to graph structures. Therefore, this paper designs a graph feature extractor with a pure or lightweight attention mechanism that does not rely on graph encoding or feature projection matrices, called Collaborative Filtering Transformer (CFT). CFT uses the pure attention mechanism as the main structure of the encoder and introduces non-interaction information by constructing preference sentences. The core of CFT is that its encoder does not contain any collaborative information, but only plays a role in generating associations between different nodes, so that a node on the graph can notice its non-neighboring nodes. The utilization of collaborative information is only achieved through optimizing the loss function. In addition, through theoretical analysis and experimental verification, we prove that adding path-based graph encoding in CFT has a negative effect on the feature extraction process of the attention mechanism. Furthermore, experiments show that during the optimization process, the proposed pure attention mechanism can always assign higher attention scores to nodes with interactions, while making the attention scores between nodes without interactions approach zero. Finally, our model achieves the best performance when compared with the latest methods on five real-world datasets, and compared to the second-best baseline, the recommendation performance is improved by up to 20.60%. Moreover, CFT achieves considerably high training efficiency across all five datasets, with training time comparable to that of simple matrix factorization-based baselines.

Read Full Paperexternally

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Journals

Information Processing & Management

Institutions

Xidian University

References and Citations

Add This Paper to Your Research Feed

Any time a new paper drops it will be there.