In the digital age in which we live, scientists are collecting huge amounts of data, and making sense of all of it is a major challenge. Often the task is to extract as much causal knowledge as possible from such data.
Many statistical methods cannot accommodate the tens of thousands of variables that may appear in these datasets. In addition, as anyone who’s taken an introductory statistics class knows, correlation does not equal causation.
However, algorithms to discover causal relationships from observational data do exist, and the Center for Causal Discovery is working to improve the ability of existing and new discovery algorithms to handle tens and hundreds of thousands of variables through advances in:
• algorithmic design,
• programming efficiency, and
• supercomputing power.
The causal networks output by these algorithms can support scientific discovery in many ways.
For example, out of the enormous number of experiments that could be performed these causal networks can help scientists decide which ones to perform next.
The results of newly performed experiments can be combined with the original data to create a dataset that in turn is used to generate new causal networks.
In this way, the algorithms within the Center for Causal Discovery support an ongoing process of scientific experimentation, data analysis, and discovery.