Computer programs can now have plausible conversations with humans, and create images that resemble paintings of famous artists. But can they do science? Unlike other areas of artificial intelligence and machine learning, in which one mostly cares about the plausibility of the final “result” (for example, the answer to a question formulated by the user, or the image generated in response to a prompt), in science we require a certain degree of transparency and guarantees of performance. In particular, we typically want to formulate mathematical models that are not only predictive, but also interpretable in terms of mechanisms; and we demand guarantees that we get the right models, at least when enough data are available.
In the last few years, ICREA professor Roger Guimerà and colleagues at Universitat Rovira i Virgili have developed Bayesian “machine scientists” capable of discovering interpretable mathematical models from data. These machine scientists establish the plausibility of models rigorously, and explore the space of mathematical models in ways that provide guarantees of eventually finding the correct one.
With the help of such machine scientists, however, they have now proved that, in certain situations in which observational data are noisy, no algorithm will ever be able to discover the true model. Moreover, they have identified a sharp transition driven by noise between a learnable regime, where models can in principle be discovered, and an unlearnable regime where no algorithm can ever discover the model that truly generated the data. This raises fundamental questions, not only about machine science, but also about the limits of human model-discovery.