Data science and molecular biology: prediction and mechanistic explanation

Lopez-Rubio, Ezequiel; Ratti, Emanuele

Publicación: SYNTHESE
VL / 198 - BP / 3131 - EP / 3156
In the last few years, biologists and computer scientists have claimed that the introduction of data science techniques in molecular biology has changed the characteristics and the aims of typical outputs (i.e. models) of such a discipline. In this paper we will critically examine this claim. First, we identify the received view on models and their aims in molecular biology. Models in molecular biology are mechanistic and explanatory. Next, we identify the scope and aims of data science (machine learning in particular). These lie mainly in the creation of predictive models which performances increase as data set increases. Next, we will identify a tradeoff between predictive and explanatory performances by comparing the features of mechanistic and predictive models. Finally, we show how this a priori analysis of machine learning and mechanistic research applies to actual biological practice. This will be done by analyzing the publications of a consortium-The Cancer Genome Atlas-which stands at the forefront in integrating data science and molecular biology. The result will be that biologists have to deal with the tradeoff between explaining and predicting that we have identified, and hence the explanatory force of the 'new' biology is substantially diminished if compared to the 'old' biology. However, this aspect also emphasizes the existence of other research goals which make predictive force independent from explanation.

Access level