Operations/Minutes/2023-09-21

OpenStreetMap Foundation, Operations Meeting - Draft minutes

These minutes do not go through a formal acceptance process.
This is not strictly an Operations Working Group (OWG) meeting.

Thursday 21 September 2023, 19:00 London time
Location: Video room at https://osmvideo.cloud68.co

Participants

Minutes by Dorothea Kazazi.

Absent

New action items from this meeting

Paul to ask MapTiler to let us know when the bug is fixed, to consider them. [Topic: MapTiler]
Grant to create a table on the cache headers that we store. [Topic: Surrogate key patch]

Reportage

GitHub template for the new attribution repository

Action item: 2023-09-07 Paul to create a GitHub template for the new repository https://github.com/openstreetmap/tile-attribution/ which will be only for cases of missing attribution from sites using our tiles. [Topic: (With LWG) Issue template/checklist for blocking sites without attribution]

There are OWG concerns.

OpenID Connect

https://github.com/openstreetmap/openstreetmap-website/pull/4226

Log-in with OSM account on wiki.osm.org: Do we want to make it globally available or only internally?

On OpenID Connect

Builds on top of OAuth.
Is a standardised way to do what we do with Discourse: have OAuth call our API to get OSM user details.
You get an ID token so you can verify it against a published public key to confirm that it's a valid osm.org identity.
Provides a standard user endpoint that returns information.
Can provide email address, if you've got the (privileged) email scope, like done with Discourse - can't get the email address unless you are a privileged app.

Points mentioned during discussion

It makes sense to get it merged.
As long as we keep the email as a privileged scope, it's probably fine.

Guillaume joined 6' after start - disconnected at 7:23

MapTiler

Background

Adding new vector tile layer on www.osm.org.
Concern about some sources of data in the tiles they use and we had asked for clarifications.

Updates

Emailed back after ~ 2 months.
They have opened up a bug report on openmaptiles about the translations.

Issues

Their cartography is a stylistic clone of OSM-Carto - Not unique, which is one of our criteria.
- Technically unique, as it is a vector layer.
  - It's a bit of a stretch on the policy.
- Suggestion: not decide about this issue now.
In cases when there is no OSM data for a name in a particular language, it displays Wikipedia data.
- Confusing to users.
- We prefer it not to display Wikipedia data, because then the OSM gaps would never get filled.
- Waiting to hear from them about this issue.

Action item: Paul to ask them to let us know when the bug is fixed, to consider them.

S3 planet update

Background

We can't host old planet files because the full history ones are over 300 GB/week (70 GB for .pbfs last week and 130 GB for XML).
We've bought increasingly bigger servers to store the data, as - in addition to the 300 GB/week that we publish - we also have to create the data.
- Norbert: GBP 18,000. 35 TB of available storage. Using Raid 6, which has data integrity.
We will put the whole archive on Amazon Web Services (AWS), so people can get old planet files from there.
AWS has given us a fully paid account that is separate from our other AWS accounts (it's fully sponsored), where we can store as much planet data as we can, under their open data program.
AWS buckets: We've got one in Europe, one in the west coast US.
Data replication: automatic from the Europe bucket to the US bucket.

On data connections

If we were to serve it ourselves, we would have to go for 10 GB bandwidth.
We pay for our upstream data connections in our data centers, and we've just degraded from ten GB link to two (for redundancy) 1 GB links, as the price difference to two 10 GB links was around €30,000 a year.
We looked at peering with other people in the data centers and at doing BGP ourselves.
Changed to HE.net from Cogent (not good IPv6 peering).

Plan for planet files

We will keep the most recent files on our own servers.
Then we will immediately synchronize the data off to AWS.

Grant

Completed TerraForm and got the buckets ready.
Permissions done.
Backported relevant data.

Not done - might be done at later stage under an archive folder

old licensed data
old files for tile stuff

Need to work out: how we slot in the S3 upload into our current generation scripts.

Tom: Created a ticket about all the different file generation things

On storing historic planet files
What is the point of storing any historical planet files, since if anybody needs them, they can easily restore them from the latest one, as they can just trim the data?

> If you are technically competent to do that, you also need bandwidth, CPU and memory.
> Many users can't do that.
> We provide the data in as many formats as practical, so that people can do their analyses.

On retrieval of historic planet files

Very common.
Keeping more planet files around is not an issue, as it currently does not cost us.

On trimming the data from our storage

There's a script in one of our repositories.
We have backed up most of the data that we've trimmed, in deep glacier tier level storage in AWS (cheapest rate storage).

On potentially switching from AWS in the future

We considered such a possibility when we endeavoured into using cloud services.
In case of hard cut off: the old historical files would not be available other than in torrent format.
In case of reasonable cut off, options would include: historical data trimming /finding another massive data provider/some sort of archival storage system.

On no planet files for the development server Currently there is no way for some applications to fetch the full data for the testing server.

Long standing issue and no one has expressed any interest in scripting it up.
Suggestion to use a small extract for testing.
Dev API is more for experimenting with code than experimenting with the data.
A lot of work to set up the planet there with replication diffs.

Suggestion: Add a ticket about planet files for the development server.

Other points mentioned during discussion

Prior effort to get someone to set up the full stack.
We have a script called Planet Update in our chef repo that was meant to keep the planet file up to date.

Next steps for renderd/mapnik bug

There were performance issues with the new server in the US.

Mapnik has certain issues when loading fonts, and it doesn't cache the results as it's supposed to.
Tom created a patch, which seems to be working.
Server now handling a significantly larger amount of traffic.

Next step: Roll the patch to the rest of the machines

Piasa

status
Is the first server that got the most threads of servers that actually take load.
CPU currently ~ 50%
I/O ~ 70%

Suggestions

Push the Sysctl swappiness down to one on all tile servers [0=no swap, 1-100: aggressiveness]
Piasa: move some traffic from Europe, to see where it breaks.

Tirex is multi-processed as opposed to multi-threaded and has a memory leak, so it needs to be regularly restarted.

Any other business

Surrogate key patch

https://docs.fastly.com/en/guides/working-with-surrogate-keys
https://docs.fastly.com/en/guides/purging-with-surrogate-keys

Concerns

We run multiple independent servers and it's not clear where we're going to purge from.
Limit per account, not per service.
Purging is generally not what you should reach for on a CDN. We should have a look at our cache headers before we even consider purging.

Other points mentioned during discussion

10% of traffic goes to the back end, but we don't know how much of that is necessary.
We could keep less cache locally, and purge the cache at a higher rate than Fastly do.
We can tell their cache to try to keep it longer.
If we got the 304s to be served from Fastly, we would boost our cache rate ratio from 80% to 85%.
We only sent "expiry" and "last-modified" headers.

Action item: Grant to create a table on the cache headers that we store.

Write-ahead logs

At the moment, the write ahead logs are used as a fallback to synchronise our servers.
The data gets replicated to the other three servers that we have in the database, and if they are out of sync, they don't pull the data from the original server, if it's more than an hour.
We keep a year for them just because they're some of the most valuable data of the project.
After 45 days, we now move the data into a cheaper storage tier.

GPS traces simplification and compression

Two separate issues, both can be solved.
No-one has written the code for the compression.
Writing the code is not an OWG responsibility.

Open Ops Tickets

Review open, what needs policy and what needs someone to help with.

Action items

2023-09-07 Paul to create a GitHub template for the new repository https://github.com/openstreetmap/tile-attribution/ which will be only for cases of missing attribution from sites using our tiles. [Topic: (With LWG) Issue template/checklist for blocking sites without attribution]
~~2023-08-24 Tom to see how can traces simplification be done. [Topic: Large scale GPX uploads] -> To be made into an issue~~
~~2023-08-24 Paul to email MapTiler [Topic: MapTiler featured layer]~~
~~2023-08-24 Paul to open a ticket to accept GitHub and Wikimedia emails [Topic: Validating user emails]~~
2023-06-29 Grant to put Martijn's policy for addition of OSM editors to the osm.org menu out for feedback. [Topic: Draft policy by Martijn van Exel]
2023-05-18 Paul to start an open document listing goals for longer-term planning. [Topic: Longer-term planning]
2023-05-04 [WordPress] Grant to share list of WordPress users with Dorothea and their response to keeping an account. [Topic: WordPress security] - Shared, but additional work required.
~~2023-08-24 Paul to work on creating a FAQ in order to reduce incoming communications. -> To be turned into a ticket~~

Navigation menu