Eigenvectors of directed graphs and importance scores: dominance, T-Rank, and sink remedies

Author: Bjelland J.   Burgess M.   Canright G.   Engø-Monsen K.  

Publisher: Springer Publishing Company

ISSN: 1384-5810

Source: Data Mining and Knowledge Discovery, Vol.20, Iss.1, 2010-01, pp. : 98-151

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

We study the properties of the principal eigenvector for the adjacency matrix (and related matrices) for a general directed graph. In particular—motivated by the use of the eigenvector for estimating the “importance” of the nodes in the graph—we focus on the distribution of positive weight in this eigenvector, and give a coherent picture which builds upon and unites earlier results. We also propose a simple method—“T-Rank”—for generating importance scores. T-Rank generates authority scores via a one-level, non-normalized matrix, and is thus distinct from known methods such as PageRank (normalized), HITS (two-level), and SALSA (two-level and normalized). We show, using our understanding of the principal eigenvector, that T-Rank has a much less severe “sink problem” than does PageRank. Also, we offer numerical results which quantify the “tightly-knit community” or TKC effect. We find that T-Rank has a stronger TKC effect than PageRank, and we offer a novel interpolation method which allows for continuous tuning of the strength of this TKC effect. Finally, we propose two new “sink remedies”, i.e., methods for ensuring that the principal eigenvector is positive everywhere. One of our sink remedies (source pumping) is unique among sink remedies, in that it gives a positive eigenvector without rendering the graph strongly connected. We offer a preliminary evaluation of the effects and possible applications of these new sink remedies.