If you are interested in children’s health, I urge you to read Seth Stephen-Davidowitz‘s New York Times article on child maltreatment during the recession (or his scientific manuscript here).
the recent Great Recession caused large decreases in referral rates for child maltreatment: areas most aﬀected by the recession saw the largest decreases… this was due not to decreases in actual maltreatment rates, but rather large decreases in the reporting rates of child maltreatment, caused by the economic downturn.
Stephens-Davidowitz realized that some Google searches were in effect self-reports of abuse by victims. If so, looking at the frequency of such searches over time would tell us about changes in the incidence of child maltreatment.
After declining for many years in the United States, the searches that seem to have come from abuse victims themselves rose as soon as the Great Recession began. On weeks that unemployment claims rose, these searches rose as well.
Searches apparently from victims were also concentrated in regions that suffered worst during the recession. Putting these ‘soft’ data together with ‘hard’ data on child mortality from neglect, he estimated that
the recent doubling of the unemployment rate increased actual child maltreatment incidents in the United States by 10.0 to 24.0 percent but decreased reported child maltreatment incidents by 12.7 percent. A likely explanation for the substantial decrease in reporting rates of maltreatment was depleted resources both for organizations likely to report cases and organizations likely to receive and investigate reports. [Emphasis added.]
This work needs to be formally peer-reviewed and replicated. Data mining from social media is in its infancy and Stephens-Davidowitz includes several caveats in his presentation. Nevertheless, I see three lessons.
First, we must add increased harm to children to the enormous cost of the great recession.
Second, it can be dangerous to take administrative records of social problems at face value. In the administrative records,
states that were most affected by the recession saw the largest decreases in referral rates for maltreatment.
but it appears that just the opposite was true. So if you use referrals for maltreatment as a measure of the rate of child maltreatment, you will underestimate the rates of maltreatment the most in the areas that spend least on children. Moreover, budget cuts to children’s services will themselves suppress the evidence of the harm to children that those cuts cause. I agree with Stephens-Davidowitz that
the contrast between the search data and the reported data tells a sad story about social services in this country. Just when more children are searching for help, we decimate the budgets of the very people who might actually do something to protect them.
The more general lesson for social scientists is that a complete scientific theory of social problems must include a model for how the data that document those problems are generated.
Finally, this is not the first interesting result from Stephens-Davidowitz using Google data. He previously used data on searches using terms expressing racial animus — you know the words I mean — as a proxy measure for racism. He then used those data to estimate how much racism affected voting for Barack Obama.
An area’s racially charged search rate… is a robust negative predictor of Obama’s vote share. Continuing racial animus in the United States appears to have cost Obama roughly four percentage points of the national popular vote in both 2008 and 2012, giving his opponent the equivalent of a home-state advantage nationally.
Data mining from social media in effect allow us to listen in on the culture. I take the Clark-Chalmers view that the mind is not bounded by the skull, but rather is extended into the world through human engagement with technologies of thought and communication. If this view is right, then data mining of social media is ‘mind reading’ in more than a metaphorical sense. This is an exciting time for social research.