Opinion
Education Opinion

Privacy, Anonymity, and Big Data in the Social Sciences

By Justin Reich — August 17, 2014 2 min read
  • Save to favorites
  • Print

You can have anonymous data or you can have open science, but you can’t have both.

That’s the conclusion that several colleagues and I reach in an article now online at Queue and forthcoming in Communications of the Association of Computing Machinery.

The short version: many people have called for making science more open and transparent by sharing data and posting data openly. This allows researchers to check each other’s work and to aggregate smaller datasets into larger ones. One saying that I’m fond of is: “the best use of your dataset is something that someone else will come up with.” The problem is that increasingly, all of this data is about us. In education, it’s about our demographics, our learning behavior, and our performance. Across the social sciences, it’s about our health, our beliefs, and our social connections. Sharing and merging data adds to the risk of disclosing those data.

The article shares a case study of our efforts to strike a balance between anonymity and open science by de-identifying a dataset of learner data from HarvardX and releasing it to the public. In order to de-identify the data to a standard that we thought was reasonably resistant to reidentification efforts, we had to delete some records and blur some variables. If a learner’s combination of identifying variables was too unique, we either deleted the record or scrubbed the data to make it look less unique. The result was suitable for release (in our view), but as we looked more closely at the released dataset, it wasn’t suitable for science. We scrubbed the data to the point where it was problematically dissimilar from the original dataset. If you do research using our data, you can’t be sure if your findings are legitimate or an artifact of de-identification.

This was a powerful relevation for many of us, especially in the face of evidence that the weapons of re-identification, in the long run, will probably outpace the shields of de-identification. We all increasingly share so much about ourselves, and ultimately the datasets created outside learning platforms will be able to be merged with datasets from learning platforms to re-identify people. It may simply not be possible to do science with anonymized data, in education or anywhere in the social sciences.

Right now, we conflate privacy with anonymity, though we need not. The Federalist Papers were anonymous but not private. Voting is private but not anonymous. If we are going to have open science with human subjects data, we’ll need to explore new approaches to balancing open science and privacy. We conclude our essay:

This example of our efforts to de-identify a simple set of student data--a tiny fraction of the granular event logs available from the edX platform--reveals a conflict between open data, the replicability of results, and the potential for novel analyses on one hand, and the anonymity of research subjects on the other. This tension extends beyond MOOC data to much of social science data, but the challenge is acute in educational research because FERPA conflates anonymity--and therefore de-identification--with privacy. One conclusion could be that the data is too sensitive to share; so if de-identification has too large an impact on the integrity of a data set, then the data should not be shared. We believe that this is an undesirable position, because the few researchers privileged enough to have access to the data would then be working in a bubble where few of their peers have the ability to challenge or augment their findings. Such limits would, at best, slow down the advancement of knowledge. At worst, these limits would prevent groundbreaking research from ever being conducted. Neither abandoning open data nor loosening student privacy protections is a wise option. Rather, the research community should vigorously pursue technology and policy solutions to the tension between open data and privacy. A promising technological solution is differential privacy.3 Under the framework of differential privacy, the original data is maintained, but raw PII is not accessed by the researcher. Instead, it resides in a secure database that has the ability to answer questions about the data. A researcher can submit a model--a regression equation, for example--to the database, and the regression coefficients and R-squared are returned. Differential privacy has challenges of its own, and remains an open research question because implementing such a system would require carefully crafting limits around the number and specificity of questions that can be asked in order to prevent identification of subjects. For example, no answer could be returned if it drew upon fewer than k rows, where k is the same minimum cell size used in k-anonymity. Policy changes may be more feasible in the short term. An approach suggested by the U.S. PCAST (President's Council of Advisors on Science and Technology) is to accept that anonymization is an obsolete tactic made increasingly difficult by advances in data mining and big data.14 PCAST recommends that privacy policy emphasize that the use of data should not compromise privacy and should focus "on the 'what' rather than the 'how.'"14One can imagine a system whereby researchers accessing an open data set would agree to use the data only to pursue particular ends, such as research, and not to contact subjects for commercial purposes or to rerelease the data. Such a policy would need to be accompanied by provisions for enforcement and audits, and the creation of practicable systems for enforcement is, admittedly, no small feat. We propose that privacy can be upheld by researchers bound to an ethical and legal framework, even if these researchers can identify individuals and all of their actions. If we want to have high-quality social science research and privacy of human subjects, we must eventually have trust in researchers. Otherwise, we'll always have a strict tradeoff between anonymity and science.

If we must have trust in researchers to enable open science, then researchers will need to earn that trust.

For regular updates, follow me on Twitter at @bjfr and for my papers, presentations and so forth, visit EdTechResearcher.

Related Tags:

The opinions expressed in EdTech Researcher are strictly those of the author(s) and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.


Commenting has been disabled on edweek.org effective Sept. 8. Please visit our FAQ section for more details. To get in touch with us visit our contact page, follow us on social media, or submit a Letter to the Editor.


Events

This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
Teaching Webinar
What’s Next for Teaching and Learning? Key Trends for the New School Year
The past 18 months changed the face of education forever, leaving teachers, students, and families to adapt to unprecedented challenges in teaching and learning. As we enter the third school year affected by the pandemic—and
Content provided by Instructure
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
Curriculum Webinar
How Data and Digital Curriculum Can Drive Personalized Instruction
As we return from an abnormal year, it’s an educator’s top priority to make sure the lessons learned under adversity positively impact students during the new school year. Digital curriculum has emerged from the pandemic
Content provided by Kiddom
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
Equity & Diversity Webinar
Leadership for Racial Equity in Schools and Beyond
While the COVID-19 pandemic continues to reveal systemic racial disparities in educational opportunity, there are revelations to which we can and must respond. Through conscientious efforts, using an intentional focus on race, school leaders can
Content provided by Corwin

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide — elementary, middle, high school and more.
View Jobs
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
View Jobs
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
View Jobs
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.
View Jobs

Read Next

Education Judge's Temporary Order Allows Iowa Schools to Mandate Masks
A federal judge ordered the state to immediately halt enforcement of a law that prevents school boards from ordering masks to be worn.
4 min read
Iowa Gov. Kim Reynolds speaks to reporters following a news conference, Thursday, Aug. 19, 2021, in West Des Moines, Iowa. Reynolds lashed out at President Joe Biden Thursday after he ordered his education secretary to explore possible legal action against states that have blocked school mask mandates and other public health measures meant to protect students against COVID-19. Reynolds, a Republican, has signed a bill into law that prohibits school officials from requiring masks, raising concerns as delta variant virus cases climb across the state and schools resume classes soon. (AP Photo/Charlie Neibergall)
Education Hurricane Ida Deals New Blow to Louisiana Schools Struggling to Reopen
The opening of the school year offered teachers a chance to fully assess the pandemic's effects, only to have students forced out again.
8 min read
Six-year-old Mary-Louise Lacobon sits on a fallen tree beside the remnants of her family's home destroyed by Hurricane Ida, in Dulac, La., on Sept. 4, 2021. Louisiana students, who were back in class after a year and a half of COVID-19 disruptions kept many of them at home, are now missing school again after Hurricane Ida. A quarter-million public school students statewide have no school to report to, though top educators are promising a return is, at most, weeks away, not months.
Six-year-old Mary-Louise Lacobon sits on a fallen tree beside the remnants of her family's home destroyed by Hurricane Ida, in Dulac, La., on Sept. 4, 2021.
John Locher/AP
Education Massachusetts National Guard to Help With Busing Students to School
250 guard personnel will be available to serve as drivers of school transport vans, as districts nationwide struggle to hire enough drivers.
1 min read
Massachusetts National Guard soldiers help with logistics in this Friday, April 17, 2020 file photo, at a food distribution site outside City Hall, in Chelsea, Mass. Mass. Gov. Charlie Baker on Monday, Sept. 13, 2021, activated the state's National Guard to help with busing students to school as districts across the country struggle to hire enough drivers.
Massachusetts National Guard soldiers help with logistics in this Friday, April 17, 2020 file photo, at a food distribution site outside City Hall, in Chelsea, Mass.
Michael Dwyer/AP
Education FDA: ‘Very, Very Hopeful’ COVID Shots Will Be Ready for Younger Kids This Year
Dr. Peter Marks said he is hopeful that COVID-19 vaccinations for 5- to 11-year-olds will be underway by year’s end. Maybe sooner.
4 min read
Dr. Peter Marks, director of the Center for Biologics Evaluation and Research in the Food and Drug Administration, testifies during a Senate health, education, labor, and pensions hearing to examine an update from federal officials on efforts to combat COVID-19 on Capitol Hill in Washington on May 11, 2021. On Friday, Sept. 10, 2021, Marks urged parents to be patient, saying the agency will rapidly evaluate vaccines for 5- to 11-year-olds as soon as it gets the needed data.
Dr. Peter Marks, director of the Center for Biologics Evaluation and Research in the Food and Drug Administration, testifies during a Senate health, education, labor, and pensions hearing to examine an update from federal officials on efforts to combat COVID-19 on Capitol Hill in Washington on May 11, 2021.
Jim Lo Scalzo/AP