Jump to: content, navigation, search

Navigation menu

2021 Survey Results: Difference between revisions

updated anonymization procedures based on additional analysis of the dataset
(updated anonymization procedures based on additional analysis of the dataset)
The following are steps we will take to anonymize published survey data in order to preserve the privacy of respondents.
 
1. Download the survey data ininto a spreadsheet.<br>
2. Create pivot tables based on demographic data, language used, and location data (country) to identify groups, particularly demographic groups, with fewer than 20 observations (e.g., fewer than 20 from any country, fewer than 20 using any language).<br>
3. Identify comments by individuals who did not grant permission to share their comments, and mark appropriately (highlight in red, for example). Ensure that those comments will not leave control of the Board.<br>
4. Copy comments to a separate spreadsheet, translate them into English using DeepL or Microsoft Translator, and publish translations of only those comments for which such assent has been granted, and publish only separately from all other data (comments can potentially be used to identify respondents).<br>
5. Given the relatively small number of females and representatives of non-binary and other genders in the OSM community, gender data must be released separately from other data in a manner that does not allow identification (deanonymization) of individual respondents. This is also true of the "time in project" variable, which could be used to identify individual respondents if matched to country of residence.<br>
5a. Create atwo duplicate filefiles of the survey data, for a total of three copies (original and two duplicates).<br>
5c5b. In bothall spreadsheets, replace labels of all demographic and country groups with fewer than 20 observations with more generic labels (e.g., if fewer than 20 respondents are from Saudi Arabia, change 'Saudi Arabia' to 'Middle East'; if fewer than 20 respondents use Arabic, change the 'ar' language code to 'Middle East').<br>
5b. In one copy of the survey data, delete the column with gender data.<br>
5d5c. In the second copyduplicates of the survey data, delete the columns containing language codes, country codes and names, and all demographic data except gender, but(in preserveone theduplicate) genderor information,"time soin thatproject" analysis(in ofthe responsesother byduplicate). gender will be possible.<br>
5c. In both spreadsheets, replace labels of all demographic and country groups with fewer than 20 observations with more generic labels (e.g., if fewer than 20 respondents are from Saudi Arabia, change 'Saudi Arabia' to 'Middle East'; if fewer than 20 respondents use Arabic, change the 'ar' language code to 'Middle East').<br>
5d. Ensure that in the two duplicates, one or the other of these columns is preserved and all other demographic columns are deleted.<br>
5d. In the second copy of the survey data, delete the columns containing language codes, country codes and names, and all demographic data except gender, but preserve the gender information, so that analysis of responses by gender will be possible.<br>
6. In both spreadsheets, delete the 'Response ID', 'Date submitted', 'Last page', 'Start language', 'Seed', 'May we share (anonymously, of course) your comments with the OpenStreetMap community?', 'Optional Demographic: Are you a... [Other]' columns from both spreadsheets to make the possibility of merging of the datasets less likely.<br>
7. Run script to scramble randomly the order of responses by gender and "time in project" in order to prevent merging with the firstoriginal spreadsheet and thereby deanonymizing the data.
 
==Plan for presentation of survey results==
Feb 20 -<br>
1.        Obtain normalization data and normalize survey results.<br>
2.        Anonymize the survey results. This will result in twothree sets of spreadsheets, one set with gender data, and one set withoutwith gender"time in project" data, and one set with other demographic variables..
 
Feb 22 -<br>