268
edits
(updated anonymization procedures based on additional analysis of the dataset) |
|||
The following are steps we will take to anonymize published survey data in order to preserve the privacy of respondents.
1. Download the survey data
2. Create pivot tables based on demographic data, language used, and location data (country) to identify groups, particularly demographic groups, with fewer than 20 observations (e.g., fewer than 20 from any country, fewer than 20 using any language).<br>
3. Identify comments by individuals who did not grant permission to share their comments, and mark appropriately (highlight in red, for example). Ensure that those comments will not leave control of the Board.<br>
4. Copy comments to a separate spreadsheet, translate them into English using DeepL or Microsoft Translator, and publish translations of only those comments for which such assent has been granted, and publish only separately from all other data (comments can potentially be used to identify respondents).<br>
5. Given the relatively small number of females and representatives of non-binary and other genders in the OSM community, gender data must be released separately from other data in a manner that does not allow identification (deanonymization) of individual respondents. This is also true of the "time in project" variable, which could be used to identify individual respondents if matched to country of residence.<br>
5a. Create
▲5c. In both spreadsheets, replace labels of all demographic and country groups with fewer than 20 observations with more generic labels (e.g., if fewer than 20 respondents are from Saudi Arabia, change 'Saudi Arabia' to 'Middle East'; if fewer than 20 respondents use Arabic, change the 'ar' language code to 'Middle East').<br>
5d. Ensure that in the two duplicates, one or the other of these columns is preserved and all other demographic columns are deleted.<br>
▲5d. In the second copy of the survey data, delete the columns containing language codes, country codes and names, and all demographic data except gender, but preserve the gender information, so that analysis of responses by gender will be possible.<br>
6. In both spreadsheets, delete the 'Response ID', 'Date submitted', 'Last page', 'Start language', 'Seed', 'May we share (anonymously, of course) your comments with the OpenStreetMap community?
7. Run script to scramble randomly the order of responses by gender and "time in project" in order to prevent merging with the
==Plan for presentation of survey results==
Feb 20 -<br>
1. Obtain normalization data and normalize survey results.<br>
2. Anonymize the survey results. This will result in
Feb 22 -<br>
|