Operations/Minutes/2025-03-20

From OpenStreetMap Foundation

OpenStreetMap Foundation, Operations Meeting - Draft minutes

These minutes do not go through a formal acceptance process.
This is not strictly an Operations Working Group (OWG) meeting.

Thursday 20 March 2025, 19:00 London time
Location: Video room at https://osmvideo.cloud68.co

Participants

Minutes by Dorothea Kazazi.

Absent


New action items from this meeting

  • Grant to investigate whether Karm's latency spike on 10 Jan 2025 is due to IO or network. Most likely IO. Karm may need upgrading to handle sync. [Topic: Database Server Upgrades]
  • Grant to set the sys request variable to be more dynamic, as we tune the number of threads that MDRAID enables, and it is likely not more than four. [On 10 Jan 2025 peak] [Topic: Database Server Upgrades]
  • Grant to negotiate with HE.net if we can get better cost from them as a fallback link (which he had proposed), to allow budget spend elsewhere. [Topic: HE.net]
  • Grant to follow-up with the South African contact about the potential hardware donation from a mobile network. [Topic: New offers of servers in Australia and South Africa]
  • Grant to upgrade (inplace) muirdis to Debian and then Tom will set up a staging instance of the wiki upgrade. [Topic: Upgrading the OSM wiki]
  • Grant to follow-up with two new people. Will see if he can support onboarding them. 1 with Chef PR, changes to PR still pending. 1 asking about container work. [Topic: New people]
  • Grant to run an SQL query to identify more email providers used by spammers. [Topic: Spam]
  • Grant to check the metrics for any significant impact of recent spam blocking. [Topic: Spam]

Reportage

2025 Budget

Related to 2025-03-06 action item: Grant to present a draft budget at the next meeting.

Grant is working on the 2025 OWG budget.

Pending questions

  • Should we purchase a second database server for Dublin? (in the agenda)
  • Do we want an additional general-purpose machine?
    • We acquired one either last year or the year before, and it proved useful for cases when we wanted to deprecate old systems.
    • We don't have a spare machine, aside some third-party machines that may not be functional.

Potential Nominatim server

Response to Meta about the addition of the Rapid editor

Related to 2025-01-23 action item: Grant to check whether Paul wants to pick up responding to Meta [Topic: Rapid editor]

Grant put Paul Norman in contact with Mikel Maron (Advisory Board coordinator), who is working on the issue. After their discussion, the questions from Meta have gone back to the board. The board is now dealing with the responses to Meta on some policy aspects.

New render server in Australia or Asia/Pacific?

Related to 2024-09-19 action item: Grant to confirm that the AArnet servers will be removed and to ask the Australian community whether there is interest in hosting/providing a render server in Australia or Asia/Pacific [2024-09-19 topic: AArnet Servers going away]

  • The Australian National University is offering us a render server, similar to the one provided by the Polish community.
  • They had promised to reply on Monday, but they haven't, so Grant will email them.
  • Dani Waltersdorfer (Board) has asked to be included in the loop, as she wanted to help if an educational justification was required.

Database server upgrades

Consensus: A second database in Dublin is worthwhile, as it allows synchronous commits.

AMS: Checking latency of existing secondary machine (Karm - Read only database mirror for www.openstreetmap.org), as it may need upgrading to handle sync. https://prometheus.openstreetmap.org/d/77k_bqFMz/postgresql?orgId=1&from=now-6M&to=now&timezone=utc&var-instance=karm&var-datname=$__all&refresh=1m&viewPanel=panel-20

  • The highest it has peaked out in the last 24 hours (2024-03-20) was about seven seconds.
  • In the last month it has gone over from 10 seconds to 2 minutes. The duration was about 5 minutes.

On 10 Jan 2025 peak

  • Peak during index rebuild - lasted for 3.5 hours.
  • The annual index rebuild job usually runs between the 8th-14th of January each year.
  • The rate of changes on the primary is faster than the slave can consume during index rebuild.
  • Suggestion: Investigate whether this is due to IO or network.
    • Network bandwidth hits 600 megabits per second during high load periods.
    • Disk I/O seems to be the main bottleneck. MDRAID device showed 82% utilisation while individual disks only showed 11% utilisation.
    • Suggestion: Grant to set the sys request variable to be more dynamic, as we tune the number of threads that MDRAID enables, and it is likely not more than four.

On switches

  • Current switches have 1 gigabit internal interfaces, while network cards have 10 gigabits.
  • 1 gigabit per data stream and there are 2 data streams.
  • Suggestion: Upgrade the switches, rather than use the two free SFP+ ports with 10 gigabit capacity available on current switches.
    • Newer EX4300MP model switches offer 24 ports at 10 gigabit and 24 ports at 1 gigabit.
    • Cost of similar models to the ones we bought: ~ 1-2K/switch.
    • There are other new, more expensive models with lower consumption.

On re-indexing

  • Next annual scheduled re-indexing: 8 Jan 2026.
  • Smaller re-indexing done monthly.
  • We could also start re-indexing manually.

Suggestion

  • Replace the second server in Amsterdam and the server in Slough.

Other point mentioned during discussion

  • Having one database server in Slough is sufficient, as we will not be using it as a primary site again.

Action items

  • Check latency of existing secondary machine in Amsterdam (Existing Amsterdam karm secondary may need upgrading to handle sync).
  • Grant to investigate whether Karm's latency spike on 10 Jan 2025 is due to IO or network. Most likely IO.
  • Grant to set the sys request variable to be more dynamic, as we tune the number of threads that MDRAID enables, and it is likely not more than four. [On 10 Jan 2025 peak]

HE.net

  • Grant has previously proposed keeping them as a secondary network provider.
  • Service renewal: June/July 2025.
  • Currently: 2 Gbit links (1 per data stream) with no commit limit.

Action item

  • Grant to negotiate with HE.net if we can get better cost from them as a fallback link (which he had suggested), to allow budget spend elsewhere.

New offers of servers in Australia and South Africa

Australia

  • Australian National University in Canberra offered to host a server - details pending.

South Africa

  • South African contact mentioned a potential hardware donation from a mobile network.
  • Current South African server is too slow with only 4 cores.
  • Grant wants to move imagery processing to South Africa for latency benefits.

Action item

  • Grant to follow-up with the South African contact about the potential hardware donation from a mobile network.

OSM wiki upgrade

Suggestion

  • Create a wiki staging instance to e.g. test extensions.

Action item

  • Grant to upgrade (inplace) muirdis from Ubuntu to Debian and then Tom will set up a staging instance of the wiki upgrade.

New people

Action item

  • Grant to follow-up with two new people. Will see if he can support onboarding them: 1 with Chef PR (changes to PR still pending), 1 asking about container work.

Spam

New OTRS tickets.

  • Majority of spam from people with Gmail accounts.
  • Some spam from people using ProtonVPN, and probably Proton, mail.ru and disposable email addresses.

Suggestion

  • Change the message on the sign-up screen. Perhaps:
    • suggest turning their VPN off, instead of contacting us.
    • display their IP address, so that it can be provided easily to the OPS. We currently tell people who have been blocked and contact us to look their IP online and provide it to us.

Other points mentioned during discussion

  • We had blocked some spam from AWS.
  • Some people are using VPNs to evade the blocks.
  • Tom has introduced a feature where - if the account gets reported as spam - the account is more likely to get suspended.
  • Some subnets in India, Bangladesh, and Vietnam have a mix of genuine and spam users.
  • Most of the spam is unseen.
  • Grant identified some spam indicators in spammy OSM user profiles.
  • Approximately 30,000 OSM accounts associated with a single Apple email address.
  • Microsoft has been creating all their accounts from a single IP address.
    • Microsoft has a special override - there's a special ACL that allows account creation even if they're breaking the rate limits.

Action items

  • Grant to run an SQL query to identify more email providers used by spammers.
  • Grant to check the metrics for any significant impact of recent spam blocking.

Open Ops Tickets

Review open, what needs policy and what needs someone to help with...


Next meeting

  • On 2025-04-17 (2025-04-03 meeting cancelled)

Action items reviewed at the beginning of the meeting

  • 2025-03-06 Grant to present a draft budget at the next meeting.
  • 2025-01-23 Grant to email proposal on upgrading OOB (currently Raspberry Pi) [Topic: Discussion on upgrading OOB (currently Raspberry Pi)] - DONE
  • 2025-01-23 Grant to check whether Paul wants to pick up responding to Meta [Topic: Rapid editor] - In board hands with Mikel / Paul.
  • 2024-09-19 Grant to create an IP blocklist script. [Topic: Cloudflare keep enabled?][2024-09-19 Reportage] - Discussion during 2024-07-25 OPS to make a reasonable evaluation whether to go with Cloudflare, Fastly or none. - Grant to create now
  • 2024-09-19 Grant to confirm that the AArnet servers will be removed and to ask the Australian community whether there is interest in hosting/providing a render server in Australia or Asia/Pacific [2024-09-19 topic: AArnet Servers going away] - We in conversation with Australian National University for new hardware.

Action items that have been stricken-through are completed, removed, or have been moved to GitHub tickets.