Operations Meeting* - Agenda & draft minutes
Wednesday 2 June 2021 18:00 London time
Location: Video room at https://osmvideo.cloud68.co
* Please note that this was not strictly an OWG meeting.
Minutes by Dorothea Kazazi.
Github - OSM Operations: Purchase switches for DB4 (#526)
- RJ45 10G
- at least 4 SFP+ ports.
- 1 for upstream
- 1 for bonding
- 1 for DB
- 1 for render
24 ports preference because:
- we plug render and DB servers
- gives us option to split rendering from database
- we already have switches with 4 ports in Amsterdam
Juniper [model not mentioned]
- 10G and 1G ports
- cost less than Cisco.
- 24port: ~ 800 without SFP add-on ports - looked like you had to get them from US
- bonded across 2 switches? yes.
- Command line better than Cisco.
Suggestion: check the temperature handled.
- Concern: power supply particularly high, as it's a PoE only switch.
- Check: how low PSU can you put, as they are hot-swappable (e.g. 350 watt PSU instead of 1400 watt).
- Check for other switch with more SFP+.
- MP model (24 ports, 10 G) doesn't do back to front.
- Price: ~ 400-500 GBP for 4 ports
- Had to get from US.
Discussion on ambient temperature tolerated by switches
- Juniper MP model (24 ports, 10 G) doesn't do back to front.
- Cisco does left-right cooling.
- The air is cool at front of rack only. Exit temperature at back of the rack - air will be blown back down and around the switch, which is fine.
- AMS: PDUs are at the back but sensors are separate module on front of the rack, on a cable.
- AMS: We have an extra sensor, to put at back of the rack.
- Normally 10 oC rise.
- 24,5 oC at the front at AMS.
- 35 oC is the ambient temperature that switches handle.
- Reverse flow: only accept low temperatures.
- A4 models: up to 45 oC (heat-shock territory)
- A5 models: up to 35 oC.
Suggestion: get blanking plate for front of the rack where switch is.
- Price: 2nd hand ~ 2500 GBP from German reseller.
- all 48 ports are 10G (rj45 connector)
- SFP+ ports to do fiber up upstream.
- does left-right cooling.
- Cisco (1 model up from ours): ~ 2500 /switch from 2nd hand reseller.
- Can do VRP and bonded-link.
- Haven't asked for price.
On Border Gateway Protocol (BGP)
- Cisco: doesn't do full BGP - only tiny BGP (1000 routes max/internal BGP only).
- Juniper: Might handle full BGP.
- We decided not to do BGP.
Action: Grant and Guillaume to finalise the model of switches. Switches to go to the Dublin data center. was: Grant to price up 10Gb Switches options for review and decision. [Topic: 10Gb Switches]
Links from this session
- www.uk.insight.com: Juniper Networks QFX Series QFX5100-48Tswitch - 48 ports - Managed - rack-mountable
- www.fs.com: S5850-48T4Q, 48-Port Ethernet L3 Fully Managed Plus Switch, 48 x 10GBASE-T, with 4 x 40Gb QSFP+ Uplinks
- www.router-switch.com: Cisco S550X switch with 48 x 10 Gigabit Ethernet 10GBase-T copper port, 4 x 10 Gigabit Ethernet SFP+ (dedicated), 1 x Gigabit Ethernet management port
- www.senetic.co.uk: Cisco SX550X-52-K9-EU - Managed - L3 - Gigabit Ethernet (10/100/1000) - Rack mounting
Ordering DB and planet servers
- Paul sent proposed specs.
- Used HP hardware still needs to be ordered.
- Drives won't be HP-branded.
- To get Samsung models.
Action: OWG to look at Paul's proposed specs for ordering DB and planet servers for Dublin in next 24 hrs.
DB4 requires advance provisioning of a patch panel.
- The datacenter we're in (DB4), is fairly empty with network providers.
- Provision rack with full capacity required of rack.
- Crazy to front the cost for it.
- Cogent is in DB4.
- 1 time cost (~ 2700 GBP).
Action: Guillaume to email Equinix Dublin about their recent requirement (related to patch panel) and find out about any additional costs (e.g. crossconnect).
Message by Sarah Hoffman (lonvia): I need to talk to Marco again, how the setup is going on their side. But essentially the question was: does OWG want QGIS to sponsor something of the infrastructre. Nominatim related (nor rendering).
On tile usage Paul showed tile traffic analysis that will be shown during his SotM 2021 talk OpenStreetMap Standard Layer: Who uses it?.
- time required
- operational cost (network, bandwidth, hosting space)
- growth without bounds.
- QGIS sponsoring 2 rendering servers for redundancy (marginal cost 1 server) + CDN every 1.5/2 years. Review after 1 year. ~ 5000 EUR/year?
- Having more options, not only OSM-Carto.
Other point mentioned
- Featuring OSM in QGIS is positive, right community.
On Nominatim usage
Suggestion: Sponsoring ~ 1000 EUR/year for Nominatim & reevaluation, as their users might be doing bulk geocoding.
On current policy
- 1 app should do 1 request/sec, it's not per user.
- Auto-enforcecement per IP address, policy is per app. Can't auto-enforce across all.
- Would break usage policy, even without bulk geocoding.
- OWG to make clear to QGIS that if they don't built a rate limit in the software, they'll get auto-blocked. Bulk geocoding is strongly discouraged.
- Paul to work out numbers for QGIS sponsoring.
Any other business
Amsterdam data center
- The OSMF treasurer (Guillaume Rischard) has asked Cogent to switch to 6 month invoicing (saving 25 GBP/month) then will switch back to monthly invoicing, when banking situation goes back to normal.
Action items from this meeting
- Grant and Guillaume to finalise the model of switches. Switches to go to the Dublin data center. was: Grant to price up 10Gb Switches options for review and decision. [Topic: 10Gb Switches]
- Guillaume Guillaume to email Equinix Dublin about their recent requirement (related to patch panel) and find out about any additional costs (e.g. crossconnect). [Topic: Dublin Move]
- OWG to look at Paul's proposed specs for ordering DB and planet servers for Dublin in next 24 hrs. [Topic: Dublin Move]
- Paul to work out numbers for QGIS sponsoring. [Topic: QGIS]
- OWG to make clear to QGIS that if they don't built a rate limit in the software, they'll get auto-blocked. Bulk geocoding is strongly discouraged. [Topic: QGIS]
- 2021-05-19 Grant to give Twitter credentials to Paul
(was: 2021-05-05 Grant to check/fix GroupTweet for osm_tech Twitter account)# 2021-06-02 pending. Operations/Minutes/2021-05-19 Grant to provide breakdown of planet server files. [Topic: Planet servers]# 2021-06-02 Done.
- 2021-05-19 Grant to price up 10Gb Switches options for review and decision. [Topic: 10GB Switches]
- 2021-05-05 Grant to email Toby from WikiMedia Foundation and suggest chating to MapTiler. [Topic: Wikimedia] # 2021-06-02 pending.
2021-05-05 Grant to provide switches model and vendor to Paul.[Topic: Dublin updates - Someone to handle network purchasing] # 2021-06-02 crossed-out, in order to have 1 action item about switches. 2021-04-21 Paul to work out where we need the new HP DL360 servers. [Topic: New HP DL360 servers] # 2021-05-19 on the agenda - decision 7 to go to Dublin. 2021-04-21 Paul to tweet asking for recommendation of HP resellers in Ireland. [Topic: New HP DL360 servers] # 2021-05-19 will tweet once he gets the info# 2021-06-02 German reseller selected. 2021-04-21 Paul to check how raid1 with hot spare works out with the budget. [Topic: New Rendering server]
- 2021-03-24 Paul to create ticket related to API PostgreSQL update [Topic: API PostgreSQL update] # 2021-06-02 pending.
- 2021-03-24 Tom to report back on TimescaleDB again at next meeting. [Topic: Reportage] [was: 2021-01-13 Tom to evaluate TimescaleDB] [Topic: Longer term metric retention] #2021-04-21 SSD Disk Failing in US # 2021-05-19 decision to leave on the agenda. # 2021-06-02 nothing new.
- 2021-02-24 Grant to install a Discourse instance to get us started. [Topic: Discourse] # 2021-04-21 on the agenda. # 2021-05-19 & 2021-06-02 pending.
- 2021-01-13 OWG to send message to the servers we want to keep. [Reportage. Existing CDN servers] # 2021-03-24 Three servers stopped talking to us (shenron, naga and one more) # Operations/Minutes/2021-05-192021-05-19 & 2021-06-02 pending.
- 2021-01-13 Grant to wipe thorns and the 3 other machines [AMS] [Topic: Longer term metric retention] # 2021-05-19 pending # 2021-06-02 Ramoth data drives wiped - decision: Grant to do final wipe of Ramoth and leave it until next site visit. Discussion about 16G DDR3.
- 2021-01-13 Paul to create ticket with Equinix to scrap the wiped thorns and the other 3 machines [Topic: Longer term metric retention].
- 2020-12-02 Grant to develop some thoughts on what is next for us using AWS. [Topic: AWS] # 2021-05-19 & 2021-06-02 pending.
- 2020-11-04 OWG to work out tile log archival and deletion policy at later stage. [Topic: Commercial CDN] # 2021-03-24 & 2021-05-19 deferred to future point.
- 2020-10-21 Paul to write to Discourse ticket and email the board [Topic: Discourse].
- 2020-09-09 Grant [Topic: AWS] Speak to AWS person about going ahead with open data program with official OSM S3 bucket. # 2021-05-19 & 2021-06-02 pending.
- 2020-09-09 Grant [Topic: AWS] Talk to OpenAerial Map/HOT. # 2021-05-19 & 2021-06-02 pending.
- 2020-08-12 Michal to try to rekindle excitement about people helping with imagery (on dev channel/imagery channel or Slack). # 2020-08-26 No progress.
- 2020-07-29 Grant to enable background sync to AWS S3. [Topic: Ironbelly] # 2020-08-12, 2020-08-26 & 2021-06-02 Manually run, automated scripting to be added. # 2021-05-19 Grant to run the script again.
- 2020-07-29 Grant to check with Wiki Admins on hCaptcha (reCaptcha replacement). [Topic: Wiki reCaptcha issue] https://github.com/openstreetmap/operations/issues/454 # 2020-08-12 hCaptcha people reached out and happy to help. Blocker on Mediawiki 1.35 being released in August. # 2021-05-19 blocker removed. # 2021-06-02 pending.
- 2020-07-15 Paul and|]] Grant to quote up a server to replace errol/kessie. [Topic: Replacement of Errol/Kessie]. # 2020-08-12 A new person in OWG asked to do Errol. Need to replace it at some point - at UCL. # 2021-05-19 & 2021-06-02 pending.
2020-07-15 Ian to try converting fluxBB DB to go into Discourse. [Topic: OSM Forum (FluxBB) update]. # Evaluating whether moving is an option. Need to see about history, user log-in. # 2021-05-19 decision to leave the action item open.# 2021-06-02 Ian had suggested to start from clean slate, action item to be removed.
- 2020-07-01 Paul to create a ticket about solutions to reduce incoming comms. [Topic:Revision of acceptable use policy to reduce incoming comms] # 2021-05-19 decision to leave the action item open. # 2021-06-02 discussion about priority for account deletion.
- 2020-07-01 Grant to work out some of the questions for an online form as a solution to reduce incoming comms. [Topic: Revision of acceptable use policy to reduce incoming comms] 2020-08-12 need to think about the reply # 2021-05-19 decision to leave the action item open.
- 2020-06-04 Paul to update the Github ticket "Adding API key support for tile.osm.org" https://github.com/openstreetmap/operations/issues/342
- 2020-04-10 Grant to work out a table of different data bits, work out how they are backed up and what can be potentially improved. [Topic: High Availability / Redundancy of OpenStreetMap.org (and primary services)] # 2021-05-19 decision to leave the action item open. # 2021-06-02 pending.
Operations meetings are currently being held every 2 Wednesdays, at 18:00 London time.
Online calendar showing the OPS meetings.