Probabilistic models of network growth have been extensively studied as idealized representations of network evolution. Models, such as the Kronecker model, duplication-based models, and preferential attachment models, have been used for tasks such as representing null models, detecting anomalies, algorithm testing, and developing an understanding of various mechanistic growth processes. However, developing a new growth model to fit observed properties of a network is a difficult task, and as new networks are studied, new models must constantly be developed. Here, we present a framework, called GrowCode, for the automatic discovery of novel growth models that match user-specified topological features in undirected graphs. GrowCode includes a formal representation of models that is general enough to encode several previously developed models. Coupling this formal representation with an optimization approach, we show that GrowCode is able to discover models for protein interaction networks, autonomous systems networks, and scientific collaboration networks that better match properties such as the degree distribution, the clustering coefficient, and assortativity that are observed in real networks of these classes. Additional tests on simulated networks show that the models learned by GrowCode generate distributions of graphs with similar variance as existing models for these classes.
Robert Patro*, Geet Duggal*, Emre Sefer, Hao Wang, Darya Filippova, and Carl Kingsford.
"The missing models: a data-driven approach for learning how networks grow." In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 42-50. ACM, 2012.
* denotes equal contribution