Operations/Minutes/2024-08-22

From OpenStreetMap Foundation

OpenStreetMap Foundation, Operations Meeting - Draft minutes

These minutes do not go through a formal acceptance process.
This is not strictly an Operations Working Group (OWG) meeting.

Thursday 22 August 2024, 19:00 London time
Location: Video room at https://osmvideo.cloud68.co

Participants

Minutes by Dorothea Kazazi.

Not present


New action items from this meeting

  • Guillaume to make a limited in scale experiment to assess impact and practicality. E.g. look if there are clients that say they support WebP and don't. [Topic: Fastly image recoding]
  • Guillaume to keep OPS in the loop about what Fastly says. [Topic: Fastly image recoding]
  • Grant to talk to Guillaume on setting up the testing about image recoding and shielding. [Topic: Fastly image recoding]

Reportage

Builder for Debian packages

Related to action item [2024-08-08] Grant to work on the builder for Debian packages [Topic: apt.openstreetmap.org next steps?]

Some progress. Grant will work with Tom to push into apt.

apt.osm.org supports api upload

Related to action item [2024-08-08] Grant and Tom to discuss apt.osm.org supports api upload [Topic: apt.openstreetmap.org next steps?]

Related to previous action item. Needs an authentication layer.

Evaluate Fastly Security (DDOS) Protections

Related to action item [2024-08-08 OPS to evaluate Fastly Security (DDOS) Protections we could use. [Topic: Cloudflare / Fastly]

Grant looked into that. Fastly have some new security products (as they have bought a company) including next generation web application firewall.

  • The new security product is not enabled on our account, but could be.
  • Allows rate-limiting over a longer period of time (60 window, which is probably bigger than the current window we have).
  • Can be managed by rule groups.
  • Some customisation is possible (slow down IP addresses and IP blocking).

Make a reasonable evaluation whether to go with Cloudflare, Fastly or none

Related to action item [2024-07-25] OPS to make a reasonable evaluation whether to go with Cloudflare, Fastly or none. [Topic: Cloudflare keep enabled?]

Could be discussed today.

  • iD had some asset issues in the past days related to the background layer, which is probably bundled into iD at build time.
    • Comment that the complaint was before the deployment of the new iD version yesterday and that iD hasn't changed since April 2024.

Added to the agenda.

Determine the Cloudflare API call to block IPs, in order to deal with scrappers

Related to action item [2024-07-25] Grant to determine the Cloudflare API call to block IPs, in order to deal with scrappers [Topic: Cloudflare keep enabled?] 2024-08-08 update: Briefly discussed

  • It works and just needs to be deployed somewhere.
  • Tom fixed the main problem of sending one year expiry on 404s, which could also have caused errors with browser caching.

Follow up with Copernicus and see if we can get rendering servers from them

Related to action item [2024-07-25] Paul to follow up with Copernicus and see if we can get rendering servers from them. [Topic: State of the Map Europe 2024]

Done. Paul in process of setting that up now.

Do capacity planning for tile.openstreetmap.org

Related to action item [2024-06-27] OPS to do capacity planning for tile.openstreetmap.org [Topic: rhaegel usage?]

  • Paul changed the balancing some weeks/ a month ago, but hasn't changed the capacity much.
  • Still needed.

Fundraising team might talk to Microsoft about getting some resources and asked for suggestions. Grant suggested a render server like the AWS one, or a Vector Tiles server.

Revisit the OpenMapTiles application

Related to action item [2024-05-02] OPS to revisit the OpenMapTiles application.

Suggestion: Topic to be added to the next meeting's agenda.


Fastly image recoding

Guillaume proposed to use Fastly image recoding, to turn PNGs into WEBP and/or avif.

Concerns

  • Risk: investing effort into a project that could end up being a technical dead-end.
  • Not open-source.
  • Implementing this would require us to enable shielding, which means making configuration changes to some of our existing rules. We would need to review all rules to ensure there are no conflicts.
  • The benefits are unclear, especially since it adds latency.
  • It increases our dependency on the Fastly service, which could lead to adopting solutions that we may not be able to support independently in the future.
    • We don't have to rely on it, we can choose to disable it if needed.
  • There are a significant number of changes we would need to make

On concern of unclear benefits

  • The expected improvement in connection issues in Southeast Asia, as suggested by Fastly, was not observed, nor were there improvements in other metrics. A full investigation is needed.
  • There may be a slight increase in cache hits due to the possibility of caching at both points.

On Open Source concern

  • The change does not comply with our FOSS policy https://osmfoundation.org/wiki/FOSS_Policy - we typically use open-source software on the website.
    • We previously made an exception to this policy when we switched to Fastly for distribution.
      • This was necessary because we require a CDN and cannot operate one independently.
  • The policy exists to protect our interests.
  • This change (using Fastly's image recording) is not essential, as we are currently operating without it.

On gains from image recoding

  • The most significant benefit would come from serving images in the WebP format, as this would reduce the amount of data stored and transmitted. The edge could then convert the images to a suitable format for the client, such as PNG.
    • That could potentially lead to compatibility issues.
  • Most of tiles we serve are delivered multiple times, not just once.
  • The potential data savings are significant: 20-50% when converting from PNG to WebP.
    • The traffic on the edge does not currently affect us.

Fastly

  • Says that shielding is required for image optimisation.

On reasons for making the change

  • There isn't a good reason to implement this change.
  • It is cool and fun.
  • Marketing for Fastly is a huge benefit, and maintaining good relations with sponsors is crucial. Fastly's leadership appeared enthousiastic and positive about showing OSM using their service, when Guillaume met them.
    • It seems that the only reason for pursuing is Fastly's enthusiasm for the idea.
    • Comment that Fastly did not specifically propose the switch to WebP (see below).
  • The change is relatively minor. If the performance is similar, and it only requires a one-time adjustment to our logic, it doesn't seem like a major obstacle.
  • It could benefit a small group of users with poor internet connections.
    • Our tiles are already very small.
    • This is a big part of our users.

On whether Fastly came to Guillaume asking us to use WebP

  • It is something Guillaume mentioned during a brainstorming session over food after an event in June on what we could make use of, and was probably mentioned before.
  • Grant had previously told Fastly that he would love to serve his aerial imagery in WebP format because it is significantly better. However, he encountered technical challenges since JOSM doesn’t have built-in WebP support. He might have jokingly suggested using WebP for the OSM default layer, which could have sparked their interest. They appeared enthusiastic, particularly from a marketing perspective.
  • The explicit ask from C level at Fastly was to use their services.
  • People from Fastly mentioned using the distributed database.

Alternative suggestion to use Fastly's image recoding service by serving WebP or another highly compressed format, and converting it to PNG at the edge for clients that require it

  • WebP support would need to be added to mod-tile.
    • While there is partial support for WebP in Mapnik, it has never been fully tested.
  • WebP support would also need to be integrated into JOSM, which would increase the binary size by 100K. OSMF tiles are used in JOSM when users select areas for download.

Other points mentioned during discussion

  • We're trading bandwidth for CPU time. Concern that turning image recoding on will significantly increase CPU usage at Fastly's en. This prompted a Fastly employee to inquire internally about our scale.
  • Fastly may need to plan for capacity, and it would be acceptable if they ask us not to proceed, due to excessive CPU usage on their end.
  • We could generate WebP images by investing time on mod-tile.
  • The impact of what edge may require us to do is unknown at the moment.
  • There have been disagreements about some of the technologies we're using, including a recent complaint from a Tor user regarding our use of Captcha.

On whether to have a vote

  • There is no proposal yet.

Action items

  • Guillaume to make a limited in scale experiment to assess impact and practicality. E.g. look if there are clients that say they support WebP and don't.
  • Guillaume to keep OPS in the lopp about what Fastly says.
  • Grant to talk to Guillaume on setting up the testing about image recoding and shielding.

Links shared


Cloudflare and Fastly. Also Fastly shielding

  • We are currently using Cloudflare, which is the largest competitor of our biggest sponsor, Fastly.
  • Fastly is providing us with services at no cost, which include:
    • (on paper) USD 10K/month on bandwidth and requests - excluding their premium support. There are volume discounts.
    • (on paper) ~ USD 2.7k/month for enterprise support (direct communication).

Cloudflare

  • Enabling or disabling Cloudflare requires only one commit. Enabling takes approximately 10 minutes to complete.
  • If we decide to turn Cloudflare off, we must be prepared to reactivate it.
  • When Cloudflare is turned off, our backend IP addresses become more exposed. Turning the shields up - which is simple - would not be very effective against DDOS attacks.

In case of a DDOS attack with Cloudflare off, we would have to:e

  • Ask HE.net to null route those IP addresses.
  • Turn-on Cloudflare.
  • Move the service IPs.
  • Have new IPs for the services that we're running.
  • Suggestion: get second IPs for the servers, put them behind Cloudflare, and null route the first IP addresses on our switches.
    • Our switches won't help, because it'll still saturate our uplink.
    • We can't meaningfully null route anything ourselves, we have to get the upstream to do it.

Tom disconnected ~ 36' after start.

Suggestion

  • Turn Cloudflare off, as there is no immediate need for it, and take the risk of a similar DDOS incident hapenning again. We would have to be ready to turn Cloudflare back on short notice. Prioritise building a Fastly distribution for the CDN.
    • The rate limiting controls seem manual and wouldn't use them in emergency, but that can come with practice.

Other points mentioned during discussion

  • There are no options of doing DDOS protection ourselves.
  • We have ~ 32 IPs, approximately 1/3 are in use.

Decision

To discuss in the OPS channel about turning Cloudflare off.

Request to not minute some information related to the recent DDOS attack.


Action items reviewed at the beginning of the meeting

  • 2024-08-08 Grant to work on the builder for Debian packages [Topic: apt.openstreetmap.org next steps?]
  • 2024-08-08 Grant and Tom to discuss apt.osm.org supports api upload [Topic: apt.openstreetmap.org next steps?]
  • 2024-08-08 OPS to evaluate Fastly Security (DDOS) Protections we could use. [Topic: Cloudflare / Fastly
  • 2024-07-25 Grant to determine the Cloudflare API call to block IPs, in order to deal with scrappers [Topic: Cloudflare keep enabled?] 2024-08-08 update: Briefly discussed
  • 2024-07-25 OPS to make a reasonable evaluation whether to go with Cloudflare, Fastly or none. [Topic: Cloudflare keep enabled?]
  • 2024-07-25 Paul to follow up with Copernicus and see if we can get rendering servers from them. [Topic: State of the Map Europe 2024]
  • 2024-06-27 OPS to do capacity planning for tile.openstreetmap.org [Topic: rhaegel usage?]
  • 2024-05-02 OPS to revisit the OpenMapTiles application. # 2024-06-13 They haven't responded to the questions. Paul to email them again. # 2024-07-25 They have replied, OPS haven't had a chance to look at the answers.

Action items that have been stricken-through are completed, removed, or have been moved to GitHub tickets.