Operations/Minutes/2024-11-14

From OpenStreetMap Foundation

OpenStreetMap Foundation, Operations Meeting - Draft minutes

These minutes do not go through a formal acceptance process.
This is not strictly an Operations Working Group (OWG) meeting.

Thursday 14 November 2024, 19:00 London time
Location: Video room at https://osmvideo.cloud68.co

Participants

Minutes by Dorothea Kazazi.

Absent

Decisions from this meeting

  • We don't need any new database servers, as we have three.
  • Grant to go to the data center in Amsterdam.

Action items from this meeting

  • Paul to add the cache control headers. [Topic: Budget]
  • Grant to go to the data center in Amsterdam. [Topic: Budget]
  • Paul to email Meta that Rapid could be added to the list of editors on www.osm.org, if it fulfills some requirements. [Topic: Rapid]
  • Grant to talk to Sarah Hoffmann and Mikel Maron and work out what we want to ask Equinix. [Topic: Equinix USA data center or Internet?]

Reportage

OpenMapTiles fonts

Related to 2024-10-31 action item: Paul to look where OMT pulls their fonts from and their style file. [Topic: OpenMapTiles application]

Pulling it from their servers. Paul opened an issue to discuss that.

Rapid application

Related to 2024-10-31 action item: Paul to email the board about the Rapid application to be added as a featured layer on www.osm.org [Topic: Editor inclusion policy]

Emailed the board and they replied.

AArnet servers

Related to action item 2024-09-19 Grant to confirm that the AArnet servers will be removed and to ask the Australian community whether there is interest in hosting/providing a render server in Australia or Asia/Pacific [2024-09-19 topic: AArnet Servers going away]

Another academic network will give us a VM. The Polish VM is faster than Dribble.


Budget

Notes by Grant

Servers needing upgrade:

  • stormfly-04 needs upgrading. (v2 scalable CPU on Gen10...)
  • Tile server refresh? Maybe 1 year left. Check.
  • Vector 3rd server would be nice. Distributed may help latency. Pre-rendering helps.
  • DB storage? Maybe. secondary servers CPUs

The board requested a budget breakdown into categories, with a simpler format than the previous year.

1. Grants issued
2. Travelling
3. Internet Charges
4. Computers Hosting Services
5. Legal Fees
6. Consultancy Fees
7. Professional Fees
8. Software Subscriptions
9. Administrative Assistant support, including minuting of meetings
10. Contractors

  • Main cost categories will probably be travel, internet charges, and computer hosting services.
  • Equinix zero-rating will reduce the computing hosting costs significantly.
  • Grant to do the budget this year. Will setup a call with Paul.

On upgrades for next year

  • Nominatim: We could get a new server in the US (v2 scalable CPU on Gen10) to replace stormfly-04, which is running on SSDs. Other Nominatim servers Dulcy, Vhagar, Longma.
  • Tile servers: Odin and Ysera will likely get replaced in 2026, as they are noticeably slower and older than Culebre and Nidhogg.
  • Vector Tiles: We could have a new server in the US. Currently we have 2 servers for vector tiles: Cmog in Poland and Dribble. The service will have high visibility, so it would be nice to maintain redundancy while upgrading. A third server would also help a bit with the latency.

Dulcy will be ok for 2025.

On adding cache control headers

  • Grant suggested to reuse the headers (stale-if-error, stale-while-revalidate) that we use for the tile.osm.org
  • Helpful if a particular origin is unreachable from a particular cache.
  • Shouldn't impact user performance until something is going wrong.

Action item

Paul to add the cache control headers.

Database servers

  • Read only database mirrors for www.openstreetmap.org: Karm (13/18 TB used) and Eddie are the oldest database servers and identical - we added space this year. We're using 12 out of 24 bays on Karm. We might want to replace the CPUs (from 2014) by the time we fill up the current disks.
  • The 3 Snaps are the newer servers. They have the same usage.
    • Snap 01 (Frontend web server for www.osm.org): 13/24 TB used. Could last 8 more years. Usage goes up around 1 TB per year.

Saved 3-4 TB with the Postgres update.

Decision

We don't need any new database servers, as we have three.

Imagery

Grant wants more storage for imagery on Lockheed in AMS.

Currently Lockheed has some of Grant's spare disks. The OWG has approved the purchase of 20 TB hard drives and six of them are waiting to be installed. The disks have arrived and are officially held for five days.

Options

  • Use remote hands.
  • Grant could go and also do other tasks, e.g. disks need swapping and do the airflow change to the switches. It might turn out cheaper.

Decision

Grant to go to the data center in Amsterdam.

Other points mentioned

  • Grant improved the signal by changing which roaming network it used, which has helped a lot. He also got the manufacturer recommended antenna for that chipset, and that has drastically improved the signal strength when he tested it on his home setup.
  • Can swap four disks in now and two later at the next upgrade, during the New Year.
  • There's enough space across the four disks with RAID 5 or 6 to be able to move the existing data.

Network

Nothing needs upgrading in the next year, as switches have support for another year and then they start to enter extended support.


Rapid

Related to the suggestion to add Rapid as a featured layer on www.osm.org.

The OWG received a reply from the OSMF board.

In April 2024 Martijn van Exel (Meta) asked Tom and Andy whether editors like Rapid can be added to the osm.org drop-down menu.

Suggestions

  • Decide about the addition of Rapid on www.osm.org and then come back to the editor policy. Disentangling them will help, as the Rapid decision is primarily a political decision related to funding.
  • Paul to email Meta.

Other points mentioned

  • Martijn might not be at Meta anymore.
    • Dorothea offered to send to the OWG the email addresses of other contacts at Meta.

Action item

Paul to email Meta that Rapid could be added to the list of editors on www.osm.org, if it fulfills some requirements.


Equinix USA datacenter or Internet?

Equinix has approved AM6 and Dublin as zero rated from the 1st of December 2024, including the contractual terms we requested, such as month-to-month payments. Remote hands and additional services will likely be charged as usual (confirmation needed). Grant has also informed them of our interest in expanding to the US, and they indicated that expansion might be possible. They have asked for the specifications and requirements for our US needs as soon as possible to draft the contract, which they need by December.

Grant has not replied yet to Equinix and and is considering asking them to provide internet services for AM6 and Dublin instead of, or in addition to, a data centre in the US.

On costs of existing data centres

  • Power and space: ~ 21K
  • Network ~ 8K

Concern

  • Moving the goalpost after Equinix has agreed, doesn't come across as very professional.

Proposed wording for reply

  • We've reevaluated that a higher priority for us would be replacing our internet service provider in Amsterdam and Dublin. Would you be willing to include it in the offer?

On Equinix internet

  • They are effectively Tier 2 and peering with others, including HE.net
  • They support different types of BGP and have reasonable peering.

On Equinix internet options

  • They only offer bonded links on 10 gigabit and above.
  • They offer dual gigabit links but only for full redundancy.
  • Their setup appears to be significantly more resilient than ours.
  • Concern: We need to specify our expected maximum transmit speed on the form, but it's unclear what will happen if we exceed it.
    • 300 EUR per extra gigabit commit.
  • Options
    • 1. Burstable: Charges are based on the 95th percentile of either inbound or outbound traffic, whichever is higher.
    • 2. Fixed bandwidth and price (clamped): Recommended by Equinix for Internet access.
    • 3. Pure usage. Recommended by Equinix exclusively for band management.
    • The interval is likely 5 minutes.

On getting a US site from Equinix

  • Concern about racking.
  • If the new site is in Vancouver, Paul can help, but there's not much practical use to have it there.
  • If the new site is in SF, Minh can help if needed, but there's not much practical use to have it there.

On replacing HE with Equinix

  • Internet connectivity would be more beneficial than a data centre in the U.S.
  • The smart hands service in Oregon is not great.
  • We had problems with HE.net as a Tier 1 provider due to the lack of alternative routing.

We currently pay EUR 670/month.

HE.net

  • Their peering over IPv6 with AWS is broken. They claimed it was an issue with Amazon's announcements.
  • They're good if you have multiple peers and four backlinks, which we lack.
  • Our contract probably expires after 6 months or more.

Cogent: was slightly better but they are not a Tier1 provider.

Oregon State University

  • Is great, but difficult to do upgrades. Racking has generally been fine in the past.
  • We're currently using 3.5 kW, with a maximum limit of 5 kW.
  • We would have more flexibility if we manage it ourselves.
  • Latency on the west coast is good. Oregon is better connected.
  • They are happy with us because we do the hardware administration.
  • They seem to be constrained on space.
  • We don't get dual links on the hosts from them, but we haven’t encountered many switch-related issues.
  • If they ran out of port space, we could buy them another switch.

Switches cost

  • £400 in the UK to get a switch with the same setup of the ones that we have.
  • The next generation is a lot more expensive because they're still popular.

On getting a free US data center from Equinix

  • It's unclear whether the internet would be provided for free.
  • If we are unable to get internet at our Europe locations, but we can get a rack in North America with Equinix internett, we would likely take advantage of the offer.

Order of preferences: Everything > Internet in Europe > Datacenter with internet in the US.

On our power usage

  • Current power usage at 3.679kW, exceeding the 3.5kW limit.
  • Dribble increased power usage by 200 watts.
  • Last month we were not charged extra for power.

Suggestion: Ask for more power.

Other point mentioned

  • Amsterdam: Ironbelly (Site gateway) needs to be shut down, but blocked by constraints of data migration to Lockheed.

Action item

  • Grant to talk to Sarah Hoffmann and Mikel Maron and work out what we want to ask Equinix.

Action items reviewed at the beginning of the meeting

  • 2024-10-31 Paul to contact OMT about their application to be added as a featured layer on www.osm.org. [Topic: OpenMapTiles application]
  • 2024-10-31 Paul to look where OMT pulls their fonts from and their style file. [Topic: OpenMapTiles application]
  • 2024-10-31 Paul to email the board about the Rapid application to be added as a featured layer on www.osm.org [Topic: Editor inclusion policy]
  • 2024--09-19 Grant to create a ticket for action item 2024-08-08 OPS to evaluate Fastly Security (DDOS) Protections we could use. [Topic: Cloudflare / Fastly]
  • 2024--09-19 Grant to create an IP blocklist. [Topic: Cloudflare keep enabled?][2024-09-19 Reportage] - Discussion during 2024-07-25 OPS to make a reasonable evaluation whether to go with Cloudflare, Fastly or none.
  • 2024--09-19 Grant to confirm that the AArnet servers will be removed and to ask the Australian community whether there is interest in hosting/providing a render server in Australia or Asia/Pacific [2024-09-19 topic: AArnet Servers going away]
  • 2024-08-22 Guillaume to make a limited in scale experiment to assess impact and practicality. E.g. look if there are clients that say they support WEBP and don't. [Topic: Fastly image recoding]
  • 2024-08-22 Guillaume to keep OPS in the loop about what Fastly says. [Topic: Fastly image recoding]
  • 2024-08-22 Grant to talk to Guillaume on setting up the testing about image recoding and shielding. [Topic: Fastly image recoding]
  • 2024-07-25 OPS to make a reasonable evaluation whether to go with Cloudflare, Fastly or none. [Topic: Cloudflare keep enabled?]
  • 2024-06-27 OPS to do capacity planning for tile.openstreetmap.org [Topic: rhaegel usage?]

Action items that have been stricken-through are completed, removed, or have been moved to GitHub tickets.


Meeting adjourned 1 hour and 7 minutes after start.