Week 3: Consumer Data and the Social Sciences

The article “What Happens When Big Data Blunders” by Logan Kugler concentrates on the failure of Google researches using search trends to predict flu outbreaks, identifying these failures to be a result of both the inability of Google researchers to isolate what should be meaningful indicators of illness (searches about flu symptoms and remedies) from other trendy searches and the difficulty of adapting dynamic nature of Google’s search algorithms to assumptions about the search habits of the susceptible population. This article characterizes a common problem within social science research: statistical methods which once struggled to collect enough data are applicable now that digital resources faithfully aggregate copious amounts of information, but these methods often require stable sampling techniques which don’t align with the goals of the application or the consumer’s behavior. In a few words, messy data is as bad as no data. As Kugler notes, Google’s profit-driven, business goals don’t align with those of social science researchers and the data being collected is often skewed by the desire of the application (like Google search) to improve customer experience, rather than provide consistency. Finding clever ways to work with data with has been comprised in such a way, allowing social scientists to piggy-back their experimental data collection on modern applications, would provide ways for businesses to profit from selling consumer data and ways for social scientists to utilize the computational resources which have revolutionized so many other fields already.

Advertisements

Week 2: Big Data’s Telescope

In “The Science of Culture? Social Computing, Digital Humanities and Cultural Analytics”, Lev Manovich compares and contrasts three fields which aspire to study the intersection of human culture and digital technology. His analysis marks the importance of differences in the “default” approach taken by scientists of different fields, specifically the distinction between those who have been trained to identify and characterize trends and those who wish to explain and utilize them; social computing, he argues, is so data-driven that results often fail to place themselves in cultural contexts unless intentionally curated to do so, while the digital humanities often restrict themselves to the study of historical artifacts, unable or unwilling to apply themselves to modern large data sets.

In the past 20 years, mostly because of the invention and widespread use of the Internet, computational resources have found (or forced) their way into nearly all social activities. Computers play important roles in communication and media, art and science, commerce, education, and personal relationships, to name a few areas and have enabled the development of countless novel social activities, such as online gaming or social media. These computational resources have had two profound effects on the social sciences: they have tirelessly cataloged social behavior, organizing it into tangible forms, making flesh an impressive array of social features and functions, and they have dramatically changed the behavior itself, an effect that threatens the relevance of much of the work of social scientists of decades past.

The first effect, that which results from our now compulsive data collection, has been heralded as the beginning of a revolution in the study of human behavior and culture; just as microscopes and telescopes brought empirical observation to certain fields of the sciences, accelerating the rate of scientific discovery, many who study society and culture assert the use of Big Data will allow for the same indisputable validation of their theories which they envy in the hard sciences. Tools like Twitter and Facebook could hardly be more favorably designed for those wishing to test and evaluate theories about social interaction and communities, or the spread of information and art, and access to this data is already being commoditized. These technologies can be compared to the telescope, a piece of technology which can appreciated for its utility, much like Twitter and Facebook, yet seemingly perfectly designed to accelerate advancement in the field of astronomy. It wasn’t simply a technique or tool that improved observation; it redefined observation for those who studied the stars and set a standard for the resolution with which astronomers were required to study their subjects. The telescope, however, also marked the end of astrology; the discriminating, mechanical view of the heavens betrayed the myths and ethos the stars had held for centuries. This marks not the rejection of a competing theory or explanation by a new technology; it demonstrates the destruction of a scientific tradition, a way of understanding, resulting from the use of a certain technology. Unlike the telescope, Big Data hardly provides its users with an objective guide for interpretation; this has produced a scramble by the self-proclaimed “data scientists” to standardize and implement tools which collect and interpret social data. Cultural studies have not carefully rooted themselves in the empirical, data-driven methods and practices, leaving them at a disadvantage when applying to their research questions quantitative methods. Similarly, astrology lost its scientific value when other methods, those of astronomy, provided more subtle, more important characterizations of heavenly bodies. Rooting itself in empirical observation and falsifiable prediction, rather than human affairs and myth, astronomy flourished as a respected scientific discipline while the interpretative astrology wasted away. If social scientists can carefully cultivate data-intensive research, it may very well be the start of a revolution. If the rush to cross disciplines and adapt theories is performed too hastily however, social scientists may find their ideas misunderstood, or worse, misunderstood and ‘justifiably’ discredited; these powerful tools, unless carefully adapted to specific problems, will tear ideas away from those studying those most human topics.

The second effect poses a similar threat, in that it demonstrates the degree to which the social sciences must adapt to a data-driven world. The adaption required by the second effect is not one of research methods; the change to the social landscape brought by information-gathering and -sharing devices suggests social scientists are expected to deliberately accept or reject elements of this electronic culture as novel forms of social interaction. For example, communities, groups organized geographically, politically, by faith, etc, have provided fundamental units for social scientists for centuries. The present requirement of understanding online communities is a daunting task, as it requires one of two approaches: mapping the behaviors of these traditional communities into those of online communities (a task complicated by the dissimilarity of human social behaviors on- and offline) or developing unique categorizations and methods of analysis for online communities, an endeavor which, by definition, restricts the application of previous results in social and cultural research to online communities. There are countless examples of modern social activities with little or no historical equivalent; studying these activities within the decades-old framework of social theory ignores research opportunities made possible by the impressive stores of data about these activities. The Internet and social media applications are presently the wild west of human social activity; shepherding these activities into pre-existing definition denies their novelty, where declaring their novelty limits their relevance to other forms of social research.

What effect will Big Data have on the study of society and of culture? How will data-driven results be viewed when given as answers to cultural research questions, and to what extent will the possibility of such answers change the questions being asked? These questions are best tackled with interdisciplinary discourse and presented here is a pessimistic view, one from a member of the quantitative sciences, one of doubt, hastily typed and gently supported, but nonetheless an attempt to rationalize.