Operations/Minutes/2023-08-24

From OpenStreetMap Foundation

OpenStreetMap Foundation, Operations Meeting - Draft minutes

These minutes do not go through a formal acceptance process.
This is not strictly an Operations Working Group (OWG) meeting.

Thursday 24 August 2023, 19:00 London time
Location: Video room at https://osmvideo.cloud68.co

Participants

Minutes by Dorothea Kazazi.

Absent

New action items from this meeting

  • Tom Hughes to see how can traces' simplification be done. [Topic: Large scale GPX uploads]
  • Paul Norman to open a ticket regarding the reindexing and frequency. [Topic: Large scale GPX uploads]
  • Paul Norman to email MapTiler. [Topic: MapTiler featured layer]
  • Paul Norman to open a ticket to add GitHub and Wikimedia as 3rd party authentication providers [Topic: Validating user emails]
  • Paul Norman to ask Andy Allan about metrics and if he wants to rotate the key at some point. [Topic: Tracestrack featured layer]

Reportage

New US machine

Currently at Oregon State University - will be taken to the data center there.

Host the SotM Brazil 2023 website

The SotM Brazil 2023 organising team will host their website on GitHub pages.

Starting an open document listing goals for longer-term planning

  • Paul Norman got some things from Craig Allan (Board member).
  • Question: top down versus bottom up approach.

Creating a ticket about solutions to work on creating a FAQ in order to reduce incoming communications

Action item to be changed to: "Paul to work on creating a FAQ in order to reduce incoming communications"


Large scale GPX uploads

Reference: https://github.com/openstreetmap/operations/issues/931

On API size growth https://github.com/openstreetmap/operations/issues/931

  • Historically we grew at 13% a year and it's been up to 25%.
    • combination of index bloat and a change in behaviour for trace uploads (e.g. overnoded traces).

On indexing

  • we hadn't reindexed for ages
  • we reindexed earlier this year, we got rid of most index bloat

Sunnypilot

https://www.openstreetmap.org/user/sunnypilot

  • Tom Hughes has previously talked to sunny pilot, due to the email account they were using (one OSM account for uploading many traces).
  • The number of trace upload messages they were getting was causing problems with our mail queue, because it kept locking out or refusing to send any more messages from us.
  • mmd opened an issue with Sunny pilot, saying that they should simplify their trace points.
  • Sunnypilot seems to have default opt in for data upload https://github.com/sunnyhaibin/sunnypilot#-user-data
  • using OAuth 2.0

Dragonpilot

https://www.openstreetmap.org/user/dragonpilot

  • Grant Slater spoke to them, asking them to stop uploading small GPX traces, as some had just two points in them.
  • They will change the code so that traces will have a minimum of 1000 points.
  • Dragonpilot is probably a fork of Sunnypilot.

Issues

  • May take people long to update to the new version.
  • Data:
    • getting the same traces repeatedly (e.g. by people uploading their commute every day).
    • some traces are over-noded.
  • their users might not be aware that their traces are being publicly uploaded.
    • there seems to be a setting where they can turn off the uploading
  • they're using a single account for all the uploads.

On our costs

  • Long term the API size growth will cost 1200 EUR/year, for needing more storage.

Suggestions related to import and data storage

Suggestions related to reindexing

  • Reindex the non-geodata tables, as well as nodes and relations members.
  • Frequency: Reindex regularly.
    • Might increase AW costs.
  • Reindex the small ones monthly and the rest yearly.

On importer Is running in Ruby

On storing dense data

  • Suggestion for simplification.
  • Compress data before serving on S3.

Other points mentioned during discussion

  • When someone filtered it with a 1 meter threshold that went from 6000 points to 20.
  • We don't import waypoints, only trackpoints.
  • The relation history table is quite big, as there are many versions of the relations.
  • Logs: 5 TB retrieved from S3 in three days.
  • Some of the later segments seem to close earlier - maximum observed time was about ten minutes.

Action items

  • Tom Hughes to see how can traces' simplification be done.
  • Paul Norman to open a ticket regarding the reindexing and frequency.

MapTiler featured layer

  • Haven't heard back.
  • They said about a month ago that they were looking at WikiData and that they would get back to us.

Action item: Paul to email MapTiler.


Validating user emails

if Google provides an email that is not a Gmail one, or Microsoft provides an email that is not a Hotmail one, should we still validate it or trust them implicitly? Are we going to accept third party emails being validated by OAuth providers?

History

  • Initial implementation of 3rd party login: we automatically accepted emails from Google because they were always Gmail addresses.
  • This was later extended to Facebook - emails were validated by Facebook when people set-up their accounts.
  • We have done the same with Microsoft.

Reason for email validation

  • Communication related to things the OSM users have mapped.

Potential issues

  • We don't have the whole list of domains (e.g. live.com, hotmail.com, office365 - many corporations use it with their own domains).
  • If we're going to get accused of spamming, then the users are probably already getting spammed by their 3rd party provider.

Suggestions

  • Have a policy on accepting users without validation.

Action item

Paul Norman to open a ticket to add GitHub and Wikimedia.


Tracestrack featured layer

Featured layer request: https://tracesmap.com
The map didn't work for Tom - was getting 403 for tiles due to sending a bad referrer.

First impressions

  • Layer looks interesting.
  • Has height data and we don't have height data in OSM.
  • High zoom - looks like OSM Carto.
  • Low zoom is natural Earth.
  • The road color doesn't work great on mountains
  • A lot of their height data has edges/rounding error edge.
    • Pretty common with height data

Suggestions

  • Ask Andy Allan for his metrics.
    • People have copied the key, which has been around for some time.
  • Answer their questions and provide Andy's metrics.

Action item

Paul Norman to ask Andy Allan about metrics and if he wants to rotate the key at some point.


AWS

Grant has discussed most of this publicly. Has done some tweaks and cut down the costs.

  • All data (3 Rails buckets) is being replicated to an account which archives the data, keeps all history of all objects saved and it versions all objects, so we can go back to a point in time.
  • Automatic replication, running in the background.
  • Tuned the archiving of the data tiers: WAL (Write-Ahead Log) bucket is now kept for a year.
  • Halved the amount of data we're keeping from 575 days to 366.
  • Will add rule to keep all the data at the most expensive storage tier, the Instant Access standard, for 90 days and then it will drop down to the cheapest storage tier from 90 days up to 366 days.

On AWS billing and credits

  • We've got about one month left on our credits before they expire.
  • Grant Slater asked for more credits and they said they're working on it and they'll get back to us.
  • Any remaining credits expire at the credit end date.
  • We had enough credits to run it to the end date, but we've been using them for other things that we would have spent money on.
  • If we get close to running out of credits before our current end date, we'll turn it off.
  • Hard to estimate when we'll run out based on this month's data, because of moving data.

Meeting adjourned 57 minutes after start.


Action items

  • 2023-07-27 Grant Slater to give to Paul Norman the part number for the 500 power supply. [Topic: new US machine]
  • 2023-07-27 OPS to close the ÖPVNKarte PRs for website and leaflet [Topic: ÖPVNKarte Featured layer]'
  • 2023-07-27 Grant Slater to ask the SotM Brazil 2023 organising team if they can host the SotM Brazil 2023 website with GitHub pages. If not, timebox 1 hour to get a container running. [Topic: SotM Brazil 2023 website]
  • 2023-06-29 Grant Slater to put Martijn van Exel's policy for addition of OSM editors to the osm.org menu out for feedback. [Topic: Draft policy by Martijn van Exel]
  • 2023-05-18 Paul Norman to start an open document listing goals for longer-term planning. [Topic: Longer-term planning]
  • 2023-05-04 [WordPress] Grant Slater to share list of WordPress users with Dorothea and their response to keeping an account. [Topic: WordPress security] - Shared, but additional work required
  • [[Operations/Minutes/2023-08-24 Paul to work on creating a FAQ in order to reduce incoming communications.
  • 2020-07-01 Paul Norman to create a ticket about solutions to work on creating a FAQ in order to reduce incoming communications. [Topic:Revision of acceptable use policy to reduce incoming communications] # 2021-05-19 decision to leave the action item open. # 2021-06-02 discussion about priority for account deletion. # 2022-04-09 Grant can show Paul how to do that with autoresponder which Tom built. Might be better to work on an online form (action item below).
  • 2020-07-01 Grant Slater to work out some of the questions for an online form as a solution to reduce incoming communications. [Topic: Revision of acceptable use policy to reduce incoming communications] 2020-08-12 need to think about the reply # 2021-05-19 decision to leave the action item open. # 2022-04-09 Grant is thinking about examples. Suggestion to add what is considered large for tile usage. High Usage: 1) London Marathon 2) Wikipedia 3) App with 1000 users. Ticket on text for a FAQ on high usage examples.
  • 2020-04-10 Grant Slater to work out a table of different data bits, work out how they are backed up and what can be potentially improved. [Topic: High Availability / redundancy of OpenStreetMap.org (and primary services)] # 2021-05-19 decision to leave the action item https://github.com/openstreetmap/operations/issues/941

Next meeting

Thursday 7 September 2023, 19:00 London time, unless rescheduled.

Operations meetings are currently being held every two Thursdays, at 19:00 London time.
Online calendar showing the OPS meetings.