Papers I Read Notes and Summaries

TuckER - Tensor Factorization for Knowledge Graph Completion

Introduction

  • TuckER is a simple, yet powerful linear model that uses Tucker decomposition for the task of link prediction in knowledge graphs.

  • Paper

  • Implementation

Knowledge Graph as a Tensor

  • Let E be the set of all the entities and R be the set of all the relations in a given knowledge graph (KG).

  • The KG can be represented as a list of triples of the form (source entity, relation, object entity) or (es, r, eo).

  • The list of triples can be represented as a third-order tensor (of binary values) where each element corresponds to a triple and each element’s value corresponds to ether that element is present in the KG or not.

  • The link prediction task can be formulated as - given a set of all triples, learn a scoring function that assigns a score to each triple. The score indicates whether the triple is actually present in the KG or not.

TuckER Decomposition

  • Tucker decomposition factorizes a tensor into a set of factor matrices and a smaller core tensor.

  • In the specific case of three-mode tensors (alternate representation of a KG), the given original tensor X (of shape IxJxK) can be factorized into a core tensor W (of shape PxQxR) and 3 factor matrics - A (of shape IxP), B (of shape JxQ) and C (of shape KxR) such that X is approximately W x1 A x2 B x3 C, where Xn denotes the tensor product along the nth mode.

  • Generally, P, Q, R are smaller than I, J K (respectively) and W can be seen as a compressed version of X.

  • Two embedding matrics are used for embedding the entities and the relations respectively.

  • Entity embedding matrix E is shared for both subject and the object ie E = A = B.

  • The scoring function is gives as W x1 es x2 wr x3 e0 where es, wr and eo are the embedding vectors corresonding to es, er and eo respectively. Note that both the core tensor and the factor matrices are to be learnt.

  • Model is trained with the standard negative log-likelihood loss given as (for one triple): y * log(p) + (1-y) * log(1-p)

  • To speed up training and increase accuracy, 1-N scoring is used. A given (es, r) is simultaneously scored for all the entities using the local-closed world assumption (knowledge graph is only locally complete).

  • Handling asymmetric relations is straightforward by learning a relation embedding alongside a relation-agnostic core tensor which enables knowledge sharing across relations.

Theoretical Analysis

  • One important consideration would be the expressive power of TuckER models, especially in relation to other models like ComplEx and SimplE.

  • It can be shown the TuckER is fully expressive ie give any ground truth over E and R, there exists a TuckER model which can perfectly represent the data - using 1-hot entity and relation embedding.

  • For full expressiveness, dimensionality of entity (relation) is nE (nR) where nE (nR) are the number of entities (relations). In comparsion, the required dimensionality for ComplEx is nE * nR (for both entity and relations) and for SimplE, it is min(E * nR, number of facts + 1) (for both entity and relations).

  • Many existing models like RESCAL, DistMult, ComplEx, SimplE etc can be seen as special cases of TuckER.

Experiments

Datasets

  • FB15k, FB15k-237, WN18, WN18RR

  • The max number of entities is around 41K and max number of relations is around 1.3K

Implementation

  • BatchNorm, Dropout and Learning rate decay are used.

Metrics

  • Mean Reciprocal Rank (MRR) - the average of the inverse of mean rank assigned to the true triple overall ne generated triples.

  • hits@k (k = 1, 3, 10) - percentage of times the true triple is ranked in the top k of the ne generated triples.

  • Higher is better for both the metrics.

Results

  • TuckER outperforms all the baseline models on all but one task.

  • Dropout is an important factor with higher dropout rates (0, 3, 0.4, 0.5) needed for datasets with fewer training examples per relation (hence more prone to overfitting).

  • TuckER improves performance more significantly when the number of relations is large.

  • Even with lower embedding dimensions, TuckER’s performance does not deteriorate as much as other models.