Operations/Minutes/2024-10-17

From OpenStreetMap Foundation

OpenStreetMap Foundation, Operations Meeting - Draft minutes

These minutes do not go through a formal acceptance process.
This is not strictly an Operations Working Group (OWG) meeting.

Thursday 17 October 2024, 19:00 London time
Location: Video room at https://osmvideo.cloud68.co

Participants

Minutes by Dorothea Kazazi. Including notes by Guillaume, marked with G.R: (start) to /G.R. (end)

Absent


Reportage

Buying DL380

Grant has discussed some of the options with Paul. See the "Imagery Server Proposal" topic below.

AArnet servers removal

Related to action item 2024--09-19 Grant to confirm that the AArnet servers will be removed and to ask the Australian community whether there is interest in hosting/providing a render server in Australia or Asia/Pacific [2024-09-19 topic: AArnet Servers going away]

  • A governmental organisation in Australia might be able to offer some servers. Grant to follow up with them.
  • They asked us to wait, as they might be able to offer better options later.

On previous OPS meeting discussion

Grant during one of his recent weekly meetings with Dani Waltersdorfer (board) mentioned that the previous OPS meeting was heated and there were disputes. Dani decided to investigate it further. Guillaume asked Grant to talk to him, as the new board is going to wonder what had happened. Grant made clear that he had not asked Dani to follow-up.


Imagery Server Proposal

G.R:

Kessie and Ironbelly are very old.

Kessie has 12x2TB. It's not capable of running modern imagery stuff - it's just too slow. CPU is like a low cost 2012 CPU.

Ironbelly is 10 years old. Disks failing; ongoing replacements with cheap spares. Failures keep happening. Weird unmonitored raid card, weird oob. Management tools are hard to get, etc.

Grant is proposing a HP Gen9 DL380, like our other servers. Decent one is £500 excluding storage disks. Grant is suggesting six large disks; Paul thinks we should use SSDs/NVMes, but Grant thinks the cost isn't justified.

Grant has a personal machine we can use. See email.

AWS is possible, but has many unknowns.

Grant will ask the chat for a vote including Paul.

/G.R.

On imagery storage and access

  • Australian reference imagery: 3 TB, doesn't get accessed much anymore.
  • UK Ordnance Survey OpenMap and Ordnance Survey Street View imagery: 1 TB processed, we keep all the source files and very small percentage of that gets accessed.
  • Namibian Topo: 100 GB, very little of that gets accessed.
  • Luxembourg rendered DEM: does not get accessed much.
  • US imagery: 1 TB, does not get accessed much.
  • Brazil: 200 GB, was given to us and Grant was asked to help with it
  • UK small regions: 1 TB

Total storage requirement: At least 50 terabytes.

On replacing Ironbelly and Kessie

Ironbelly

  • Age: ~ 11 years old.
  • Disks: failing were replaced with cheap second hand, 1 already failed one expected to fail.
  • Weird RAID card, not properly monitored as we have upgraded and management tools are hard to get.
  • 7% annual failure rate for the life of the machine. Backblaze failure rates: 1.7% on average.

Kessie:

  • Alerting every few mins due to weird RAID performance issues.
  • We have replaced all disks in it over time.
  • Storage: 20 TB.
  • Not capable of running modern imagery.
  • Processor old ~ 2012.
  • At Exonetric. They are willing to host another server for us, but they wanted something modern and power efficient.

On not doing the work on AWS

  • We'd either have to run on EC2 or Lambda.
  • Running on Lambda requires sponsorship, a lot of time to develop it and get it working, unknown how severe the latencies to S3 would be and unknown resource requirements. E.g. how big the machines would need to be and how much data would we push. AWS will need estimates from us before sponsoring. This might take a lot of time to set up.

On using COGs

Concerns:

  • CPU requirements unknown
  • Potential latency issues (see below)
  • Storage (see below)
  • Time invested
  • Updates will take a long time to process, as reprocessing would involve starting from scratch.

On potential latency issues If Cloud Optimized GeoTIFFs (COGs) are not aligned with the intended tile boundaries, compositing multiple layers of COGs becomes necessary, which is resource-intensive. At tile boundary edges, four range requests are often needed, requiring substantial CPU power to composite these layers with masking. High latency in these requests can lead to slow tile loading. For instance, in the case of the Texas imagery that Grant worked on, hundreds of COG files had to be composited, resulting in considerable slow tile loading.

On storage Processing will result in output in a more storage-efficient format, like WebP, where the volume is typically reduced to about half or a quarter of the original. However, even with this reduction, the total storage required is larger than the initial dataset, as both the initial files and the WebP ones have to be stored.

Other points mentioned during discussion

  • Community members in New Zealand have developed code that processes COG files, generating new, perfectly tiled COG files.
  • Let’s not let perfection be the enemy of completion. We have done this work before, it's not a huge amount of money and can be done relatively quickly.

Suggestion: Post on the forum that we're getting a new imagery server, we're doing it the old way because this is what we know, but if someone wants to play with this project and this project and build a proof of concept.

  • Grant has tried on the OSM US channel, but there were no takers.

On options and cost

1. DL380 of the Gen 10, 12 bays - GBP 19K.

2. 6x 18-20TB enterprise new SAS disks for £1,650.

  • Advantage: space to grow.
  • New disks mean we don't have failure rate issues. Expected: ~ 1% failure rate, advertised: 0.35%.
  • The cost is within the budget left for the year.

Grant purchased the machine with his own money, as he was planning to go to Amsterdam and will take it back if OPS decide not to use it. Has seven 6TB disks, which is enough storage to get started and run some imagery. Needs more storage, which would have to be added with remote hands.


OpenMapTiles application

G.R: Short discussion, and decision to discuss this with Paul in two weeks.
/G.R.

  • We need to assess the application against the criteria.
  • The application was previously blocked due to use of Wikipedia data and other issues, which now might be resolved (see email: 29 April).

Other points mentioned

  • Policy question: Is it novel enough to qualify. Novelty: Vector Tiles.
  • We have removed styles only for tech issues.

On Paul's Vector Tiles project

  • The infrastructure is running and it is in a production suitable state.
  • There is a working demo, using the CDN.
  • OSMF seems to have scaled down the project and it currently does not have a style.

Suggestions:

  • Use a different style.
  • Discuss with Paul during the next meeting.

Other point mentioned: Paul's tiles are live updated.


Editor inclusion policy

There seemed to be a desire on the community discussion https://community.openstreetmap.org/t/updated-proposal-osm-org-editor-inclusion-policy-draft-2/116547 to not require explicit membership to the panel.

Suggestion: The board to decide. Decision: Examine feedback on community forum and re-discuss in two weeks.

A part of the discussion was not minuted after request.


Any other business

Staging of blog.osm.org on Tabaluga

  • Grant will host the staging of blog.osm.org on Tabaluga (HP ProLiant DL360 Gen9). The CWG has a contractor that will work on improving the blog and he's in contact with Mikel Maron (CWG).
  • Grant agreed to rework blog.osm.org into Hugo and create a demo, as it might be better to move away from WordPress. There are many templates for Hugo and commercial themes.

Action items reviewed at the beginning of the meeting

  • 2024--09-19 Grant to create a ticket for action item [2024-08-08](https://hackmd.io/su12wMb9TR2kd1I5lLJ8vw) OPS to evaluate Fastly Security (DDOS) Protections we could use. [Topic: Cloudflare / Fastly]
  • 2024--09-19 Grant to create an IP blocklist. [Topic: Cloudflare keep enabled?][2024-09-19 Reportage] - Discussion during [2024-07-25](https://hackmd.io/iyFjUWl1RY6D_pevem8ciA) OPS to make a reasonable evaluation whether to go with Cloudflare, Fastly or none.
  • 2024--09-19 Paul to add the OpenMapTiles application to the next agenda, together with the editor inclusion policy. [2024-09-19 Reportage] # On the 2024-10-17 agenda
  • 2024--09-19 Grant to come up with estimates for space needs in next 5 years and cost for Ironbelly replacement* [2024-09-19 topic: Ironbelly Replacement? ] Done
  • 2024--09-19 Grant to confirm that the AArnet servers will be removed and to ask the Australian community whether there is interest in hosting/providing a render server in Australia or Asia/Pacific [2024-09-19 topic: AArnet Servers going away]
  • 2024-08-22 Guillaume to make a limited in scale experiment to assess impact and practicality. E.g. look if there are clients that say they support WEBP and don't. [Topic: Fastly image recoding]
  • 2024-08-22 Guillaume to keep OPS in the loop about what Fastly says. [Topic: Fastly image recoding]
  • 2024-08-22 Grant to talk to Guillaume on setting up the testing about image recoding and shielding. [Topic: Fastly image recoding]
  • 2024-08-08 OPS to evaluate Fastly Security (DDOS) Protections we could use. [Topic: Cloudflare / Fastly
  • 2024-07-25 Grant to determine the Cloudflare API call to block IPs, in order to deal with scrappers [Topic: Cloudflare keep enabled?]
  • 2024-07-25 OPS to make a reasonable evaluation whether to go with Cloudflare, Fastly or none. [Topic: Cloudflare keep enabled?]
  • 2024-06-27 OPS to do capacity planning for tile.openstreetmap.org [Topic: rhaegel usage?]
  • 2024-05-02 OPS to revisit the OpenMapTiles application. # 2024-06-13 They haven't responded to the questions. Paul to email them again.

Action items that have been stricken-through are completed, removed, or have been moved to GitHub tickets.