Operations/Minutes/2021-05-19
OpenStreetMap Foundation, Operations Meeting* - Agenda & draft minutes
Wednesday 19 May 2021 18:00 London time
Location: Video room at https://osmvideo.cloud68.co
* Please note that this was not strictly an OWG meeting.
Participants
Present:
- Tom Hughes (OWG)
- Grant Slater (OWG)
- Paul Norman (OWG)
Minutes by Dorothea Kazazi.
Apologies:
Administrative
Previous minutes
Action items
- 2021-05-19 Grant to give Twitter credentials to Paul (was: 2021-05-05 Grant to check/fix GroupTweet for osm_tech Twitter account)
- 2021-05-05 Grant to email Toby from WMF and suggest chattting to MapTiler. [Topic: Wikimedia]
2021-05-05 Paul to do a circular for ISP in a couple of days. [Topic: Dublin updates - Network requirements change]- 2021-05-05 Grant to provide switches model and vendor to Paul. [Topic: Dublin updates - Someone to handle network purchasing]
- 2021-04-21 Paul to work out where we need the new HP DL360 servers. [Topic: New HP DL360 servers] # 2021-05-19 on the agenda.
- 2021-04-21 Paul to tweet asking for recommendation of HP resellers in Ireland. [Topic: New HP DL360 servers] # 2021-05-19 will tweet once he gets the info.
- 2021-04-21 Paul to check how raid1 with hot spare works out with the budget. [Topic: New Rendering server]
2021-04-07 Grant to get contact info of ISP to Guillaume [Topic: Improving networking in AMS]. # 2021-04-21 couldn't find it. He'll try to figure out who's available in the new data center in AMS. # 2021-05-19 decision to be removed- 2021-03-24 Paul to create ticket related to API PostgreSQL update [Topic: API PostgreSQL update]
2021-03-24 Hrvoje to check power supplies on Viserion/Drogon [Topic: Old tile caches: Viserion and Drogon]# 2021-05-19 decision to be removed.2021-03-10 Paul to have a look at TimescaleDB. [Topic:TimescaleDB] # 2021-05-19 decision to be removed as talking to TimescaleDB community, which has more expertise.- 2021-02-24 Tom to report back on TimescaleDB again at next meeting. [Topic: Reportage] [was: 2021-01-13 Tom to evaluate TimescaleDB] [Topic: Longer term metric retention] # 2021-04-21 SSD Disk Failing in US # 2021-05-19 decision to leave on the agenda.
- 2021-02-24 OWG --> Grant to install a Discourse instance to get us started. [Topic: Discourse] # 2021-04-21 on the agenda. # 2021-05-19 pending.
- 2021-01-13 OWG to send message to the servers we want to keep. [Reportage. Existing CDN servers] # 2021-03-24 Three servers stopped talking to us (shenron, naga and one more) # 2021-05-19 pending.
- 2021-01-13 Grant to wipe thorns and the 3 other machines [AMS] [Topic: Longer term metric retention] # 2021-05-19 pending.
- 2021-01-13 Paul to create ticket with Equinix to scrap the wiped thorns and the other 3 machines [Topic: Longer term metric retention]
2021-01-13 Paul to create a ticket related to tile geographical localisation. [Topic: Lack of render capacity]# 2021-05-19 Done.- 2020-12-02 Grant to develop some thoughts on what is next for us using AWS. [Topic: AWS] # 2021-05-19 pending.
2020-11-04 Grant to do heavy integrity checks to katla to test its response to heavy load. # 2021-03-24 Grant has got some disks to replace. Needs to open ticket with Bytemark.# 2021-05-19 Done.- 2020-11-04 OWG to work out tile log archival and deletion policy at later stage. [Topic: Commercial CDN] # 2021-03-24 & 2021-05-19 deferred to future point.
- 2020-10-21 Paul to write to Discourse ticket and email the board [Topic: Discourse]
2020-09-23 Grant to put in touch Guillaume and Toby. [Topic: Wikimedia challenges with Tile CDN delivery] Grant to check up on status.# 2021-05-19 superseded by email to be written.2020-09-23 OWG to pencil out what is needed. [Topic: Wikimedia challenges with Tile CDN delivery]# 2021-05-19 superseded by email to be written.2020-09-23 Toby Negrin (Wikimedia) to ask Wikimedia whether they would be interested in OSMF running a tile service available to Wikipedia and if they would be willing to share hardware resources or expertise. [Topic: Wikimedia challenges with Tile CDN delivery]# 2021-05-19 superseded by email to be written.2020-09-09 Tom to update OAuth ticket https://github.com/openstreetmap/openstreetmap-website/issues/1408 [2020-09-09 Reportage, related to 2020-08-26 action item]# 2021-05-19 Done.- 2020-09-09 Grant [Topic: AWS] Speak to AWS person about going ahead with open data program with official OSM S3 bucket. # 2021-05-19 pending.
2020-09-09 [Not assigned] [Topic: AWS] Decide on services we need to run on AWS. Need clearance.# 2021-05-19 overlap with future AWS usage - decision to have a single ticket.2020-09-09 [Not assigned] [Topic: AWS] Work out rough budget.# 2021-05-19 decision to remove as budget will be worked out once decided what to run.- 2020-09-09 Grant [Topic: AWS] Talk to OpenAerial Map/HOT. # 2021-05-19 pending.
2020-09-09 [Not assigned] [Topic: Federating OSM communities' rooms through OSMF-hosted Matrix servers] Evaluate effort required. Constrain the scope to what we can support and perhaps ask volunteers to step in.# 2021-05-19 decision to remove. Stick with Discourse for the time being.2020-09-09 [Topic: Ironbelly replacement] Paul to work out a proposal for the ironbelly replacement.# 2021-05-19 on agenda.2020-08-26 Tom to look at road ahead for OAuth. [Topic: Merge forums, OSQA, MLs to discourse?] https://github.com/openstreetmap/openstreetmap-website/issues/1408 # 2020-09-09 Did some investigation - branch with some code. Better understanding of OAuth 2 and options. Doable.# 2021-05-19 decision to remove as superceded by more recent action items.2020-08-26 Grant to talk to Ianabout migrating old content to Discourse. [Topic: Merge forums, OSQA, MLs to discourse?] # 2020-09-09 pending.# 2021-05-19 Paul has stricken this through.- 2020-08-26 [Not assigned] Create Github ticket for updated OAuth. [Topic: Merge forums, OSQA, MLs to discourse?]
- 2020-08-12 Michal to try to rekindle excitement about people helping with imagery (on dev channel/imagery channel or Slack). # 2020-08-26 No progress.
- 2020-07-29 Grant to enable background sync to Amazon Web Services (AWS) S3. [Topic: Ironbelly] # 2020-08-12&26 Manually run, automated scripting to be added. # 2021-05-19 Grant to run the script again.
- 2020-07-29 Grant to check with Wiki Admins on hCaptcha (reCaptcha replacement). [Topic: Wiki reCaptcha issue] https://github.com/openstreetmap/operations/issues/454 # 2020-08-12 hCaptcha people reached out and happy to help. Blocker on Mediawiki 1.35 being released in August. # 2021-05-19 blocker removed.
- 2020-07-15 Paul and Grant to quote up a server to replace errol/kessie. [Topic: Replacement of Errol/Kessie]. # 2020-08-12 A new person in OWG asked to do Errol. Need to replace it at some point - at University College London. # 2021-05-19 pending.
- 2020-07-15 Ian to try converting fluxBB DB to go into Discourse. [Topic: OSM Forum (FluxBB) update]. # Evaluating whether moving is an option. Need to see about history, user log-in. # 2021-05-19 decision to leave the action item open.
- 2020-07-01 Paul to create a ticket about solutions to reduce incoming comms. [Topic: Revision of acceptable use policy to reduce incoming comms] # 2021-05-19 decision to leave the action item open.
- 2020-07-01 Grant to work out some of the questions for an online form as a solution to reduce incoming comms. [Topic: Revision of acceptable use policy to reduce incoming comms] # 2020-08-12 need to think about the reply # 2021-05-19 decision to leave the action item open.
2020-07-01 Michal to reach to Amazon Web Services (AWS) (need a story for AWS to show how their help will lead to AWS spending from users). [Topic: Commercial CDN for Bulk Tile Users] https://lists.openstreetmap.org/pipermail/talk/2020-May/084700.html # 2020-08-12 Michal feels blocked, could draft something. We got contacted by AWS, not replied yet. More info at 2020-08-12 reportage.# 2021-05-19 decision to remove.- 2020-06-04 Paul to update the Github ticket "Adding API key support for tile.osm.org" https://github.com/openstreetmap/operations/issues/342
2020-06-04 OPS team: draft an email (regarding a call for proposals), ask for comments. [Topic:Adding API key support for tile.osm.org https://github.com/openstreetmap/operations/issues/342]# 2021-05-19 decision to remove.2020-04-10 OWG to push up tile usage policy (commercial entities, vehicle tracking applications - which are heavy on Nominatim and probably not attributing as well) [Topic: Commercial CDN for Bulk Tile Users]# 2021-05-19 decision to remove.- 2020-04-10 Grant to work out a table of different data bits, work out how they are backed up and what can be potentially improved. [Topic: High Availability / Redundancy of OpenStreetMap.org (and primary services)] # 2021-05-19 decision to leave the action item open.
- 2020-04-10 [Not assigned]: Potentially move some more of backup data into long term S3 buckets. [Topic: High Availability / Redundancy of OpenStreetMap.org (and primary service)] # 2021-05-19 decision to remove.
Reportage
Request for updated permissions - to be granted.
From action item updates
Replacing reCaptcha with hCaptcha.
- hCaptcha better supported now in Mediawiki.
- Bugs of reCaptcha and simple editor.
HP DL360 Gen9 servers for Dublin
Paul came up with different numbers of servers required when budgeting and now: 7 then and 10 now. (https://github.com/openstreetmap/operations/issues/525)
Improved locality of backend tile requests
https://github.com/openstreetmap/operations/issues/527
Europe tile requests are being split based on metatile coordinate into server groups of one powerful server + one weak server. Significant reduction in rendering workload.
Will this cause problems with failover?
Decision: Test 10 minute stopping of apache on odin or ysera at 22:00 or 23:00 UK time.
Planet servers
We need to decide the plan for the planet servers and if they're going to need a large (>30TB) RAID array or we will use object store. What needs to happen?
3 things that need storage
- planet server
- has backups, probably a significant portion of that
- backups should be moved to AWS
- Grant to provide breakdown of usage of planet server storage space
- Paul to look at new server with enough storage to replace ironbelly
- dev server
- imagery server
Action: Grant to provide breakdown of planet server files.
Breakdown of planet server files (incomplete run)
3.4 TB Backups
400 G log files
300 G Current run of a planet dump
RAILS storage (old user images and gpx files)
AWS
- Planet serving portion of S3 might be provided for free (potentially replication and services we need to run the S3 bucket). Wouldn't run tile services.
- Concern: development time.
Decision: 2U machine. Suggestion: 32TB + 25%. Will depend on run.
Future: add extra disks to slots. No concern for unmatched disks.
QGIS
Topic added after request of Sarah Hoffmann (Nominatim).
Deferred until we can talk to Sarah.
10Gb Switches
£3500 each minimum
Action: Grant to price up options for review and decision.
RAM for new DB server in Dublin
DB server: 11.04 TB used.
Decision: 0.5 TB RAM
Dublin tickets
https://github.com/openstreetmap/operations/milestone/5
Suggestion: split ticket "cabling and accessories" https://github.com/openstreetmap/operations/issues/529
Open Ops Tickets
Review open, what needs policy and what needs someone to help with.
https://github.com/openstreetmap/operations/issues
Action items from this meeting
- Grant to give Twitter credentials to Paul. [Action item updates]
- Grant to provide breakdown of planet server files. [Topic: Planet servers]
- Grant to price up 10Gb Switches options for review and decision. [Topic: 10Gb Switches]
Next meeting
Wednesday 2 June 2021 18:00 London time
Operations meetings are currently being held every 2 Wednesdays, at 18:00 London time.
Online calendar showing the OPS meetings.