Predicting whether two drugs have a harmful interaction or whether someone is going to like a certain movie are examples of network inference problems. In these problems, the goal is to predict new interactions (between drugs or between people and movies) based on some previously observed interactions. Having additional information about the network nodes or their metadata (for example, the mechanism of action of the drugs or the age of the individuals) helps to make better predictions, though it is not clear why or how. We explored how that improvement happens.
We studied a very general network inference problem and showed that node metadata do not affect the inference problem gradually. Rather, even when the importance assigned to the metadata increases smoothly, the inference process crosses over from a data-dominated regime to a metadata- dominated regime. These crossovers show some similarities to transitions driven by temperature, where one finds energy- and entropy-dominated regimes. Importantly, optimal inference is often encountered exactly at this crossover.
This study opens the door to better understanding the role of metadata in network inference problems and, more broadly, establishes further connections between general inference problems and physical concepts such as phase transitions.