Introduction¶
Graphs are a natural way to represent relationships and interactions between entities in real systems. For example, people on social networks, proteins in biological networks, and authors in publication networks can be represented by nodes in a graph, and their relationships such as friendships, protein-protein interactions, and co-authorship are represented by edges in a graph. These graphical models enable us to understand the behavior of systems and to gain insight into their structure. These insights can further be used to predict future interactions and missing information in the system. These tasks are formally defined as link prediction and node classification. Link prediction estimates the likelihood of a relationship among two entities. This is used, for example, to recommend friends on social networks and to sort probable protein-protein interactions on biological networks. Similarly, node classification estimates the likelihood of a node’s label. This is used, for example, to infer missing meta-data on social media profiles, and genes in proteins.
Numerous graph analysis methods have been developed. Broadly, these methods can be categorized as non-parametric and parametric. Non-parametric methods operate directly on the graph whereas parametric methods represent the properties of nodes and edges in the graph in a low-dimensional space. Non-parametric methods such as Common Neighbors, Adamic Adar and Jaccard coefficient require access and knowledge of the entire graph for the prediction. On the other hand, parametric models such as Thor et. al. employ graph summarization and define super nodes and super edges to perform link prediction. Kim et. al. use Expectation Maximization to fit the real network as a Kronecker graph and estimate the parameters. Another class of parametric models that have gained much attention recently are graph embeddings. Graph embedding methods define a low-dimensional vector for each node and a distance metric on the vectors. These methods learn the representation by preserving certain properties of the graph. Graph Factorization preserves visible links, HOPE aims to preserve higher order proximity, and node2vec preserves both structural equivalence and higher order proximity. In this benchmark library, we focus our attention on graph embedding methods. While this is a very active area of research that continues to gain popularity among researchers, there are several challenges that must be addressed before graph embedding algorithms become mainstream.