In “What is Computable,” MacCormick gives a proof for why a program that can decide whether any other program can crash cannot exist. He relates this to the halting problem and explains that although it is not as important in practice as one might think, it raises important philosophical questions about what computers and people are capable of.
In “What Happens When Big Data Blunders,” Logan Kugler explains the reasons that David Lazer and Ryan Kennedy discovered for the failure of Google Flu Trends to predict the 2013 flu outbreak. Furthermore, he discusses reasons for over predicting the spread of Ebola. In both cases, the problem was based on making assumptions based only on big data that left out changing dynamics. In the Google case, the algorithm did not account for changes in the Google search algorithm itself and in the Ebola case, the CDC and WHO did not account for the initial efforts of people working to contain the disease.
It is an interesting and challenging idea to combine the themes of these two articles. One exercise that comes to mind is coming up with our own theoretical questions about what is possible with big data and whether these questions can be answered. Some questions might be:
- Is it possible to determine whether a big data algorithm is sound by some definition of sound? If not, can we derive bounds on acceptable error?
- Is it possible to prove that a particular problem cannot be decided by any big data algorithm?
The first question is quite challenging. The goal of a “big data” algorithm is typically to make some prediction given some large quantity of data. To think about this problem, we might think about solving one of these problems without the aid of a computer. Imagine you were able to think fast enough or live long enough to process all of the data. What are some issues that might arise? Is the data relevant? Is there enough non-overlapping information in the data to arrive at an answer? We would need to answer these questions. The answer to our question involves the relationship between the question, the data itself, and the operations we can perform over the data.
For the second question, we must first decide what it means to be decidable. Clearly if we give no data, and the problem requires data, it will be possible to prove this. On the other hand, if we supply all of the data about everything, will it then be able to solve it? This is somewhat philosophical, in fact. If we knew everything, could we predict the future?