Operations/Minutes/2021-03-24

From OpenStreetMap Foundation

OpenStreetMap Foundation, Operations Meeting* - Agenda & draft minutes
Wednesday 24 March 2021 18:00 London time
Location: Video room at https://osmvideo.cloud68.co

* Please note that this was not strictly an Operations Working Group meeting.

Participants

Present:

Minutes by Dorothea Kazazi, including live notes by participants.

Apologies:

Administrative

Previous minutes

2021-03-10

Action items

  • 2021-03-10 Grant to ask Sarah about the urgency of replacement. [Dulcy - Topic: Power outage]
  • 2021-03-10 Tom to fix the Wordpress updater. [Topic: Wordpress updates] 2021-03-24 update: No action now. [Reportage]
  • 2021-03-10 Paul to have a look at TimescaleDB. [Topic:TimescaleDB]
  • 2021-02-24 Tom to report back on TimescaleDB again at next meeting. [Topic: Reportage] [was: 2021-01-13 Tom to evaluate TimescaleDB] [Topic: Longer term metric retention]
  • 2021-02-24 OWG--> Grant to install a Discourse instance to get us started. [Topic: Discourse]
  • 2021-02-24 OWG to get a new DB server for Dublin - pending board budget "level" approval (already included in "High" option). [Topic: Katla]
  • 2021-02-24 OWG--> Grant to check with fastly if they are happy to be credited as a top 3 donor. [Topic: Tile CDN]
  • 2021-02-24 Grant to add create S3 + permission for Paul to access. Grant to check on Athena requirements. [Topic: Athena]
  • 2021-02-24 Michal to speak to Paul to get Limesurvey access. [Topic: Lime Survey] # 2021-03-10 Bounced to the board by Paul - Allan sent the conditions about using OSMF account to LCCWG, who are looking at other solutions as well.
  • 2021-02-10 Paul to gather details about data centers near Dublin.
  • 2021-01-27 Sarah to inform the board. [Topic: Nominatim and QGIS]
  • 2021-01-13 OWG to send message to the servers we want to keep. [Reportage. Existing CDN servers]
  • 2021-01-13 Grant to give Paul read access to Athena. [Topic: Log analysis] #2021-02-10 give bucket access.
  • 2021-01-13 Paul to query the logging data and test producing a report. [Topic: Log analysis]
  • 2021-01-13 Grant to wipe thorns and the 3 other machines [AMS]. [Topic: Longer term metric retention]
  • 2021-01-13 Paul to create ticket with Equinix to scrap the wiped thorns and the other 3 machines. [Topic: Longer term metric retention]
  • 2021-01-13 Paul to create a ticket related to tile geographical localisation. [Topic: Lack of render capacity]
  • 2020-12-02 Grant to develop some thoughts on what is next for us using AWS. [Topic: AWS]
  • 2020-11-04 Grant to do heavy integrity checks to Katla to test its response to heavy load.
  • 2020-11-04 Grant to create bucket with right permissions and set path hierarchy. [Topic: Commercial CDN]
  • 2020-11-04 OWG to work out tile log archival and deletion policy at later stage. [Topic: Commercial CDN] # 2021-03-24 deferred to future point.
  • 2020-10-21 Paul to write to Discourse ticket and email the board. [Topic: Discourse]
  • 2020-10-21 Grant to check that Bytemark put the disk to the right controller. [Topic: Reportage] Done, but machine (Katla) has more failures than expected. Retire it?
  • 2020-09-23 Grant to put in touch Guillaume and Toby. [Topic: Wikimedia challenges with Tile CDN delivery] Grant to check up on status.
  • 2020-09-23 OWG to pencil out what is needed. [Topic: Wikimedia challenges with Tile CDN delivery]
  • 2020-09-23 Toby Negrin (Wikimedia) to ask Wikimedia whether they would be interested in OSMF running a tile service available to Wikipedia and if they would be willing to share hardware resources or expertise. [Topic: Wikimedia challenges with Tile CDN delivery]
  • 2020-09-09 Tom to update OAuth ticket https://github.com/openstreetmap/openstreetmap-website/issues/1408 2020-09-09 Reportage, related to 2020-08-26 action item.
  • 2020-09-09 [Not assigned] [Topic: AWS] Speak to AWS person about going ahead with open data program with official OSM S3 bucket.
  • 2020-09-09 [Not assigned] [Topic: AWS] Decide on services we need to run on AWS. Need clearance.
  • 2020-09-09 [Not assigned] [Topic: AWS] Work out rough budget.
  • 2020-09-09 [Not assigned] [Topic: AWS] Talk to OpenAerial Map/HOT.
  • 2020-09-09 [Not assigned] [Topic: Federating OSM communities' rooms through OSMF-hosted Matrix servers] Evaluate effort required. Constrain the scope to what we can support and perhaps ask volunteers to step in.
  • 2020-09-09 Paul to work out a proposal for the Ironbelly replacement. [Topic: Ironbelly replacement]
  • 2020-08-26 Tom to look at road ahead for OAuth. [Topic: Merge forums, OSQA, MLs to discourse?] https://github.com/openstreetmap/openstreetmap-website/issues/1408 # 2020-09-09 Did some investigation - branch with some code. Better understanding of OAuth 2 and options. Doable.
  • 2020-08-26 Grant to talk to Ian about migrating old content to Discourse. [Topic: Merge forums, OSQA, MLs to discourse?] 2020-09-09 pending.
  • 2020-08-26 [not assigned] Create Github ticket for updated OAuth. [Topic: Merge forums, OSQA, MLs to discourse?]
  • 2020-08-12 Michal to try to rekindle excitement about people helping with imagery (on dev channel/imagery channel or Slack). #2020-08-26 No progress.
  • 2020-07-29 Grant to enable background sync to AWS S3. [Topic: Ironbelly] #2020-08-12 & 2020-08-26 Manually run, scripting to be added.
  • 2020-07-29 Grant to check with Wiki Admins on hCaptcha (reCaptcha replacement). [Topic: Wiki reCaptcha issue] https://github.com/openstreetmap/operations/issues/454 #2020-08-12 hCaptcha people reached out and happy to help. Blocker on Mediawiki 1.35 being released in August.
  • 2020-07-15 Paul and Grant to quote up a server to replace errol/kessie. [Topic: Replacement of Errol/Kessie]. #2020-08-12 A new person in OWG asked to do Errol. Need to replace it at some point - at UCL.
  • 2020-07-15 Ian to try converting fluxBB DB to go into Discourse. [Topic: OSM Forum (FluxBB) update]. # Evaluating whether moving is an option. Need to see about history, user log-in.
  • 2020-07-01 Paul to create a ticket about solutions to reduce incoming comms. [Topic:Revision of acceptable use policy to reduce incoming comms]
  • 2020-07-01 Grant to work out some of the questions for an online form as a solution to reduce incoming comms. [Topic: Revision of acceptable use policy to reduce incoming comms] 2020-08-12 need to think about the reply.
  • 2020-07-01 Michal to reach to AWS (need a story for AWS to show how their help will lead to AWS spending from users). [Topic: Commercial CDN for Bulk Tile Users] https://lists.openstreetmap.org/pipermail/talk/2020-May/084700.html #2020-08-12 Michal feels blocked, could draft something. We got contacted by AWS,not replied yet. More info at 2020-08-12 reportage.
  • 2020-06-04 Paul to update the Github ticket "Adding API key support for tile.osm.org" https://github.com/openstreetmap/operations/issues/342
  • 2020-06-04 OPS team to draft an email (regarding a call for proposals), ask for comments. [Topic:Adding API key support for tile.osm.org https://github.com/openstreetmap/operations/issues/342]
  • 2020-04-10 OWG Push up tile usage policy (commercial entities, vehicle tracking applications - which are heavy on Nominatim and probably not attributing as well). [Topic: Commercial CDN for Bulk Tile Users]
  • 2020-04-10 Grant and Tom to work out a table of different data bits, work out how they are backed up and what can be potentially improved. [Topic: High Availability / Redundancy of OpenStreetMap.org (and primary services)]
  • 2020-04-10 [Not assigned] Potentially move some more of backup data into long term S3 buckets. [Topic: High Availability / Redundancy of OpenStreetMap.org (and primary service)]

Action items from this meeting

  • 2021-03-24 Paul to gather some options regarding the new data centre. [Topic: New data centre]
  • 2021-03-24 Tom to provide trace from Hertzner related to IPv6 connectivity issues to Grant. [Topic: IPV6]
  • 2021-03-24 Grant/Paul to report the connectivity issue to Cogent. [Topic: IPV6]
  • 2021-03-24 Paul to create ticket related to API PostgreSQL update. [Topic: API PostgreSQL update]
  • 2021-03-24 Grant to look at the cost of having as many CI runners as wanted. Related: split AWS account, so that CI does not run on master. [Topic: CI]
  • 2021-03-24 Paul to create a ticket about OTRS. [Topic: OTRS]
  • 2021-03-24 Hrvoje to check power supplies on Viserion/Drogon. [Topic: Old tile caches: Viserion and Drogon]

Reportage

Wordpress updater issue

Wordpress
- Retain both Git and SVN at the moment.
- 2018 post by Wordpress core team member about making Git the mirror - low priority for them.

  • SVN check-out was locked before Chef managed to do an update.
  • Wordpress SVN: no error reported.

Use Git checkout?
- Wordpress mixes upstream check-out stuff with user content.
- Not clear which is the master upstream.

No action now. Will switch in the future.

Timescale DB issue

Autovacuum job on 680MB table going on for >12 hrs.
Reading stats file over again.
Happens at certain tables.

Suggestion: Issue a VACUUM once a day?

New Data Centre

Pending board decision on Friday on the tier of the OWG budget.

What criteria do we have?
Particular questions - Are we open to hosts which are different from the power & cooling provider? (e.g. contracting with company leasing part of the data-centre space)

Requirements

  • Remote hands! At least 1 hr per month.
  • Equinix in a different region ok - need to change upstream internet provider.
  • Data Provider: Not Cogent (IPv6)
  • Consider tier 2 connectivity
  • 2 diverse upstream connections to another rack
    • Start with 1Gbps but put in fiber + transciver to be 10Gbps, options for 2Gbps.

RIPE
~2000 EUR/year for becoming a member (double cost for 1st year).

  • Concerns
    • Still have to get allocation of IPv6s.
    • Would have to develop knowledge of running BGP across multiple sites.

Suggestion: Go to ISP, as we don't have PGP knowledge.

  • Options
    • Tier1 provider (Grant has been advised against that).
    • Find another provider in the data centre, we can bridge to and have interack connections (one in AMS - rent our own full rack and connect rack-to-rack and upstream from there).

Current situation

  • Connect to Cogent via rack-to-rack and upstream from there.
  • Cogent: problematic with IPv6.
  • 1 upstream fiber connection.

Action item: Paul to gather some options.

API PostgreSQL update

- Plan for working out a timeline.

Action item: Paul to create ticket.

2022-2023 plan

Paul has worked on hardware refreshes, based on how old the machines are.

IPV6

Seems to be localised to Cogent IPv6 in Amsterdam. We should report to Cogent.
Cogent / Other Providers are problematic.
Grant / Paul: We report issue to Cogent and raise and issue.

2nd issue: google unreachable over IPv6.

Action items:

  • Tom to provide trace from Hertzner related to IPv6 connectivity issues to Grant.
  • Grant/Paul to report the issue to Cogent.

We will revisit at next meeting and decide if action required.

Faster CI (Continuous integration)

There are some jobs which are known broken.
OTRS has missing files, but they appear to breaking Open Source releases to push to commercial release.

AWS accounts need splitting, so that CI jobs do not run on master.
Options:

  • Control tower, generally available.
  • Own management module for multiple accounts.
  • Creating a sub-account for data sharing.
    • Separate accounts for different tasks (e.g. data sharing, security logging, master for billing) with granular permissions.
    • Advantage: allows to see where billing is coming from.

Action item: Grant to investigate.

We may consider private Github action runners in future to speed up CI job with additional parallel runners. https://github.com/philips-labs/terraform-aws-github-runner

OTRS

We can't stick on version 6 forever.

OTRS Community Edition forks:

Action item: Paul to create a ticket. #Post-meeting addition: https://github.com/openstreetmap/operations/issues/518 - Find alternative to OTRS 6).

Old tile caches: Viserion and Drogon

https://hardware.openstreetmap.org/servers/viserion.openstreetmap.org/
https://hardware.openstreetmap.org/servers/drogon.openstreetmap.org/

One has a faulty power supply.

Action item: Hrvoje to check power supplies.

Discourse

  • Usually is run as a container (has multiple flavours).
  • Rubén Martín's (HOT) suggestion: use Docker (wrapper script that does management). Offered to help.

Grant wants to externalise the database/persistent data storage.

Suggestion: create a non-public testing instance.
Pending: OAuth 2.0

Next meeting

Wednesday 7 April 2021 18:00 London time.