268
edits
No edit summary |
|||
The 2021 OSMF Community Survey
==Anonymization of Survey Data to Preserve Privacy==
In the interests of transparency, the OSMF strives to publish as much survey data as it can while preserving the privacy and anonymity of respondents, both in compliance with applicable privacy laws and OSMF privacy policy, and in order to balance the need for transparency and obligation to respect privacy of OSM community members.
The following are steps we will take to anonymize published survey data in order to preserve the privacy of respondents.
▲The 2021 OSMF Community Survey will conclude on February 14th. As summary statistics are available, they will be posted here. When the raw data have been anonymized, a copy of the results will be posted to this wiki and a link to the data posted on this page.
1. Download the survey data in a spreadsheet.<br>
2. Create pivot tables based on demographic data, language used, and location data (country) to identify groups, particularly demographic groups, with fewer than 20 observations (e.g., fewer than 20 from any country, fewer than 20 using any language).<br>
3. Identify comments by individuals who did not grant permission to share their comments, and mark appropriately (highlight in red, for example). Ensure that those comments will not leave control of the Board.<br>
4. Copy comments to a separate spreadsheet, translate them into English using DeepL or Microsoft Translator, and publish translations of only those comments for which such assent has been granted, and publish only separately from all other data (comments can potentially be used to identify respondents).<br>
5. Given the relatively small number of females and representatives of non-binary and other genders in the OSM community, gender data must be released separately from other data in a manner that does not allow identification (deanonymization) of individual respondents.
5a. Create a duplicate file of the survey data.<br>
5b. In one copy of the survey data, delete the column with gender data.<br>
5c. In both spreadsheets, replace labels of all demographic and country groups with fewer than 20 observations with more generic labels (e.g., if fewer than 20 respondents are from Saudi Arabia, change 'Saudi Arabia' to 'Middle East'; if fewer than 20 respondents use Arabic, change the 'ar' language code to 'Middle East').<br>
5d. In the second copy of the survey data, delete the columns containing language codes, country codes and names, and all demographic data except gender, but preserve the gender information, so that analysis of responses by gender will be possible.<br>
6. In both spreadsheets, delete the 'Response ID', 'Date submitted', 'Last page', 'Start language', 'Seed', 'May we share (anonymously, of course) your comments with the OpenStreetMap community?', 'Optional Demographic: Are you a... [Other]' columns from both spreadsheets to make the possibility of merging of the datasets less likely.<br>
7. Run script to scramble randomly the responses by gender in order to prevent merging with the first spreadsheet and thereby deanonymizing the data.<br>
|