Working Group Minutes/EWG 2014-06-23

Attendees

IRC nick	Real name
pnorman	Paul Norman
shaunmcdonald	Shaun McDonald
zere	Matt Amos
Zverik_h	Ilya Zverev
Summary

Osm2pgsql "multi backend" branch
- MapQuest have been porting osm2pgsql to C++ and adding a new backend.
- pnorman threw the whole planet at it, and it passed the statistics regression tests, which would probably have gotten regressions.
- The main blocker for merging is that we need to restore multiple core usage to the pending ways stage, as it was taken out during porting.
Osm-carto & CI
- It would be nice to have some CI going to check that any pull requests for osm-carto compile to mapnik XML as a minimum, and even better, don't throw SQL errors.
- zere suggests using jenkins, shaunmcdonald is happy to put it on the ci.osm.org jenkins instance.
- pnorman puts it on his to-do board.
CGImap and ordered / unordered results
- An implementation detail in the "read-only" instances of cgimap means they return OSM XML with elements sorted by their ID as well as their type.
- Although the docs say "You can not be sure that ... blocks are sorted", it is still confusing, especially when many tools have exactly that (usually undocumented) assumption.
- Changing the behaviour will "fix" this, but we would need to change many bits of software before we could provide this as an API guarantee. Not sure what the performance impact would be - probably small, but not 100% sure.
- Sounds like the best option is to put it to the side unless we get some benchmark results.
References to parent objects in OSM XML
- Zverik_h raised [1] (adding references to parent objects to objects returned by API calls)
- There are already separate API calls to get parent relationships (e.g: node/#{id}/ways), but this proposed change would include those in all OSM XML responses.
- zere saw a couple of potential problems: 1) it changes the OSM XML format and require a version bump to 0.7, 2) it will impact performance of the queries but probably in a manageable way. Potentially it could be a soft version bump, since we could continue to serve 0.6 at the same time as 0.7.
- Zverik_h has been collecting together a proposal for 0.7 which leaves out a lot of the "bigger" and more risky / time-consuming features (e.g: areas).
- Might be more productive to get some of the "smaller" API features out of the way and leave the more controversial ones for 0.8.
IRC Log

17:31:41 <shaunmcdonald> hi
17:31:59 <zere> minutes of the last meeting: http://www.osmfoundation.org/wiki/Working_Group_Minutes/EWG_2014-06-09 please let me know if anything needs changing
17:32:06 <zere> on the agenda for today:
17:32:32 <zere> 1) merging the osm2pgsql "multi backend" branch
17:32:45 <zere> 2) osm-carto & CI
17:33:00 <zere> 3) cgimap & sorted vs unsorted results
17:33:12 <zere> if anyone has anything else, we'll do an AoB at the end.
17:33:21 <zere> #topic osm2pgsql "multi backend" branch
17:33:33 <zere> pnorman: want to give a quick description here?
17:34:22 <pnorman> So, we (mapquest) have been porting osm2pgsql to C++ and adding a new backend. The new backend is of limited interest, it really only is useful if running VMs
17:35:37 <pnorman> I threw the whole planet at it, and it passed the statistics regression tests, which would probably have gotten regressions. There's also a more extensive test suite
17:36:34 <pnorman> The main blocker for merging from my perspective is that we need to restore multiple core usage to the pending ways stage. It was simpler to take it out during porting
17:36:35 <Zverik_h> As I'm not a real programmer, can you explain what is osm2pgsql backend? I always considered it a single, monolith program
17:37:01 <pnorman> Current backends are pgsql, gazeteer, and null
17:37:52 <Zverik_h> and what's the benefit of porting to C++?
17:38:25 <pnorman> Modern data structures. We caught some existing bugs when converting to them.
17:39:30 <zere> each backend supports a different schema for the tables, basically. the pgsql one used to be the only one (a long time ago), then we added gazetteer (for nominatim) and null (for testing). the new "multi" backend is for a slightly friendlier schema than the "four huge tables" standard backend :-)
17:40:04 <shaunmcdonald> What would the use case be for the new backend?
17:40:14 <shaunmcdonald> Would it make things like analysis easier?
17:40:24 <Zverik_h> so, the new backend is like in imposm?
17:40:58 <pnorman> matt, I thought changing the rendering schema was a post-merge thing and would impact the psql backend too?
17:41:37 <zere> none of this should impact the pgsql backend - that's being maintained for compatibility
17:42:01 <zere> until such time as it becomes like the primitive input, and can be thrown away without really affecting anyone :-)
17:42:32 <pnorman> There have been plans to bring partitioning to the pgsql backend
17:42:56 <zere> and yes, it's close to what imposm supports in terms of being able to set up your own schema for tags that you're interested in.
17:43:44 <zere> partitioning in the pgsql backend is pretty much ortogonal to what we're doing in the multi backend. although they'll bring some of the same benefits.
17:44:30 <pnorman> In any case, that's a later patch. I think at this point all the multi backend gains you is the ability to have the middle slim tables on a different server than the render tables, and to have multiple render tables?
17:48:08 <zere> i don't think we have the settings to put the middle somewhere else, but it's a pretty trivial patch since the refactor.
17:48:13 <pnorman> Anyways, performance looks positive with --number-processes 1, so just need to get it working with non-1
17:48:18 <zere> yup
17:49:01 <pnorman> I think that's about it for that topic
17:50:33 <zere> cool, ok.
17:50:46 <zere> #topic osm-carto & CI
17:51:03 <pnorman> Oh, if I haven't disclosed my employment yet, I'm working for MapQuest alongside Matt.
17:51:37 <zere> :-)
17:51:49 <pnorman> It be nice to have some CI going to check that any pull requests for osm-carto compile to mapnik XML as a minimum, and even better, don't throw SQL errors. Anyone seen this done or know any way to do it?
17:52:12 <zere> you want this to check PRs before they're merged, right?
17:52:16 <pnorman> yes
17:52:23 <pnorman> Well, more to check PRs before looking at them
17:52:52 <zere> i've no doubt it can be done with something like jenkins - it's so extensible one could use it for just about anything.
17:53:35 <pnorman> k. sticky stuck on to-do board
17:53:39 <zere> it would need a shell script which would call carto to do the compilation, then (presumably) parse the queries to execute against an empty database just to check if they errored?
17:54:17 <shaunmcdonald> I’m happy to provide access to the ci.osm.org jenkins instance if you want to run it there.
17:54:17 <pnorman> I think you can render against an empty DB and see if you get an error
17:54:21 <zere> although i suppose it would be nice to use nik2img or something to render some actual tiles...
17:54:36 <zere> shaunmcdonald: great, thanks :-)
17:54:43 <pnorman> i.e. render an actual tile, just with no data
17:55:29 <shaunmcdonald> I’d be tempted to use a small city extract and render that with data that doesn’t change.
17:55:59 <pnorman> naw, don't need data, just a schema
17:56:58 <pnorman> k, I'll talk to shaun later about details
17:57:35 <shaunmcdonald> :-)
18:00:30 <zere> okay, nxt topic
18:00:39 <zere> #topic cgimap and ordered / unordered results
18:00:44 <pnorman> when run against a read-only slave, cgimap produces XML where nodes/ways/relations are ordered by ID. With the writable master it doesn't. This difference is confusing, as it means that the results of an API call might or might not work in osm2pgsql, osmctools, or anything else that needs ordered.
18:00:50 <pnorman> https://github.com/zerebubuth/openstreetmap-cgimap/issues/88 as a reference
18:01:58 <Zverik_h> how do I request "writable master" from cgimap?
18:02:12 <pnorman> runtime options
18:03:04 <pnorman> the difference is that on the master postgres server you can use temp tables, while on read-only slaves you can't so it has to store the data locally, which ends up storing it in a structure that is inherently ordered
18:03:28 <zere> the background is that the API hasn't previously (and still doesn't) guarantee any ordering other than by type.
18:03:57 <Zverik_h> so basically addin ORDER BY will slow the perfomance, but since we use "read-only slave" on osm.org, this would go unnoticed, and those who will use it, will have much less queries, so they won't be affected?
18:04:00 <zere> it's an implementatino detail that it's ordered for the read-only cgimap backend, which has only existed since we started replicating the postgres database.
18:04:36 <zere> we basically have 3 options:
18:04:40 <pnorman> some queries go to the slaves, some to the master
18:04:59 <zere> 1) leave as-is on the API. file bugs against anything unable to take non-sorted input.
18:05:38 <zere> 2) alter writeable cgimap to return ordered results, but don't include it as part of the API contract
18:05:58 <zere> 3) alter writeable cgimap to return ordered results, and make it part of the API contract
18:06:05 <Zverik_h> there is a phrase somewhere on wiki on API that says programmers can rely on sorted order (not sure if that affects API or just dumps), so I think we should add ordering
18:06:19 <zere> well, that's just for planet files
18:06:37 <Zverik_h> ah, yes, sorry
18:06:51 <Zverik_h> "You can not be sure that: ... blocks are sorted "
18:07:22 <Zverik_h> http://wiki.openstreetmap.org/wiki/OSM_XML#Assumptions
18:07:45 <Zverik_h> what will be affected by 2) ?
18:08:31 <pnorman> Only cgimap will need altering. It will mean that osm2pgsql non-slim will work with the output of relation/full. Also a bunch of other software
18:10:17 <zere> the other possibility is just to tell people to use `osmosis --rx --sort --wx`
18:10:25 <pnorman> And install osmosis?
18:10:27 <Zverik_h> No, I mean, which servers. That is, is there a cgimap on any of osm servers with writable mode enabled?
18:10:35 <zere> yes
18:10:48 <zere> but that's not a big deal
18:10:51 <Zverik_h> the feature is nice to have, but if it slows down anything vital, we better off without it
18:11:24 <pnorman> Yes - since I actually hit this bug, at least one backend is coming off of ramoth
18:11:26 <Zverik_h> because the wiki tells one cannot rely on ordering
18:12:27 <zere> so, on the one hand, we have the a change to software to provide a behaviour which hasn't ever been promised weighed against epic horror of `sudo apt-get install osmosis` ;-)
18:12:50 <Zverik_h> pnorman: so the bug is that osm2pgsql incorrectly processed unsorted input?
18:12:56 <pnorman> zere: are there any distributions where that actually installs a sane version of osmosis
18:13:04 <pnorman> Zverik_h: that's the bug in osm2pgsql
18:13:07 <zere> sane enough to sort an xml file, sure :-)
18:13:22 <pnorman> sane enough for 64-bit IDs?
18:13:46 <zere> what version added support for that?
18:14:49 <Zverik_h> zere: 0.42
18:15:08 <Zverik_h> (out year and a half ago)
18:15:11 <zere> ah, then yet - it's 0.40.1+ds1-7 on 14.04 :-(
18:15:22 <Zverik_h> :/
18:15:32 <zere> why are they shipping such an ancient version?
18:15:51 <pnorman> because osmosis is a royal pain to build, particularly within the context of a packaging system
18:16:25 <Zverik_h> well, I'm not against enabling sorting in cgimap. Performance tests would be nice, though
18:16:44 <Zverik_h> zere: I guess that's debian upstream
18:17:13 <zere> Zverik_h: yeah, the "debian openstreetmap team".
18:18:07 <zere> i don't see a problem enabling sorting either, i just feel like it's making a change in the wrong place.
18:18:31 <zere> i'm not even sure why osm2pgsql depends on sorted input (other than by type).
18:19:42 <Zverik_h> maybe the bug is not in sorting, but in that ways should come before relations or something
18:20:13 <pnorman> No, the bug is in id-sorting
18:20:38 <pnorman> well, in expecting id-sorting and not throwing an error when it doesn't get it
18:21:42 <shaunmcdonald> I think there’s some issue with the Osmosis build system in that it requires the downloading of stuff to do the build, hence not liked by packagers.
18:22:51 <pnorman> Sounds like the best option is to put it to the side unless we get some benchmark results
18:23:34 <Zverik_h> I agree
18:24:51 <Zverik_h> and I still haven't got which osm servers would be affected by this change
18:24:54 <shaunmcdonald> With people expecting planet dumps and extracts to be id sorted, and people will download fromt he api a small area to test things with those tools, it’ll be and expectation that the sorting would be there. Thus either recommending that all software that rquires the id sorting to throw an error when not, or implementing the sorting in the API is needed.
18:25:18 <zere> i'm wondering why the sparse node cache expects sorted ids. it seems like that's something that could be fixed.
18:26:00 <Zverik_h> it seems it won't load cache once stored to disk
18:26:34 <pnorman> zere: this is in non-slim mode only, remember. but even if it was fixed, it's not hard to find another example of a program expecting sorted output, e.g. osmconvert does
18:27:09 <zere> but at least osmconvert has warnings saying that :-)
18:27:49 * pnorman spent quite some time debugging data randomly missing from his database with osm2pgsql
18:29:30 <zere> i don't think i've anything really to add to what pnorman said: put it to the side unless we get some benchmark results.
18:29:30 <Zverik_h> so... the decision is to wait until pnorman benchmarks it?
18:30:06 <zere> i suspect it won't be so bad... these files are only small. but better to make a decision based on real data.
18:30:44 <zere> we can continue this next week, if anyone's up for that?
18:30:49 <pnorman> AOB?
18:30:49 <zere> #topic AoB
18:31:04 <zere> in the meantime, does anyone else have anything they'd like to discuss?
18:31:05 <Zverik_h> I'd like to discuss https://github.com/openstreetmap/openstreetmap-website/issues/768
18:31:22 <Zverik_h> (adding references to parent objects to objects returned by API calls)
18:32:09 <Zverik_h> I think this would prevent a lot of novice users from breaking relations
18:32:50 <pnorman> I don't see how. iD will load up the area if you pan to it anyways
18:33:10 <Zverik_h> yes, but some other editors (e.g. JOSM) can download single objects
18:33:54 <pnorman> novice editors shouldn't be downloading single objects in JOSM
18:34:04 <Zverik_h> do I understand correctly that cgimap always returns all containing objects for objects inside requested? That is, all ways and their nodes for nodes, all relations for nodes and ways?
18:34:22 <zere> yes
18:34:51 <zere> any way or relation which uses any node in the bbox. similar for ways.
18:35:13 <Zverik_h> pnorman: still, it's possible. There is a common case, but sadly it deals with map calls: splitting ways / deleting nodes in ways outside downloaded area and finding out that node was referenced by another way
18:35:17 <zere> but the converse is not true - a node which is outside the bbox might be used by ways or relations not in the result.
18:35:23 <pnorman> cgimap works on a map? call by finding all nodes in the bbox, finding all way parents of those nodes, finding all node children of those ways (backfilling), finding all relation parents of either, and finding all relation parents of relations
18:35:35 <pnorman> Zverik_h: deleting nodes isn't an issue, use the if-unused delete mode
18:35:46 <zere> pnorman: GAH NO!
18:35:55 <zere> if-unused is a mighty hack
18:36:20 <zere> don't delete nodes outside of the bbox you downloaded with the map call :-)
18:37:21 <Zverik_h> zere: it doesn't happen on purpose :)
18:37:49 <pnorman> zere: pretty sure it's a safe operation for an editor to delete untagged nodes outside bbox when deleting a way and upload those nodes with if-unused
18:38:21 <zere> indeed, it's a hack so that people's whole changeset won't fail because someone else used a node. but it's a hack for concurrency, not correct editor behaviour.
18:38:59 <zere> and editor really shouldn't allow you to edit something outside the bbox, or if it hasn't done a node/#{id}/(ways|relations) call first
18:39:17 <zere> or way/#{id}/relations or relation/#{id}/relations of course
18:39:40 <Zverik_h> so there's the reason I proposed <parent> tags: so that tracking bounds of a downloaded area is no longer mandatory
18:39:56 <Zverik_h> also, for sparse editing
18:40:02 <Zverik_h> (e.g. admin bounds)
18:40:33 <zere> tracking the bounds will still be mandatory for general editing
18:40:51 <Zverik_h> until I developed a handy habit of pressing "download parent relations" every time I do anything, I often broke some
18:40:56 <zere> otherwise you might pan, and not see an object (because the editor hasn't downloaded it) and add it again
18:41:34 <zere> okay, so benefits of this aside, there's a couple of potential problems:
18:41:43 <zere> 1) it changes the OSM XML format
18:41:54 <zere> 2) it will impact performance of the queries
18:42:21 <zere> i think (2) is probably manageable
18:42:32 <zere> but (1) would require a version bump
18:42:34 * pnorman isn't sure
18:43:03 <zere> potentially a soft version bump, since we could continue to serve 0.6 at the same time as 0.7
18:43:07 <Zverik_h> I think we can resolve this for major data processors and editors. It only adds a tag, not changes anything
18:43:10 <zere> and allow editors time to change over
18:43:28 <zere> adding a tag is changing something, unfortunately
18:43:40 <Zverik_h> else, we can start the process of introducing api 0.7, making a development server
18:43:56 <zere> i guarantee there will be many bits of software which choke on getting a <parent/> element.
18:44:08 <zere> yes, i think we should start 0.7
18:44:08 <shaunmcdonald> Assuming they are parsing XML correctly, they would just ignore that extra XML tag.
18:44:25 <zere> shaunmcdonald: no, that's not true.
18:44:51 <zere> in HTML, yes. the HTML spec basically says "ignore anything you don't understand".
18:45:16 <zere> but i'm not aware of any OSM documentation saying anything similar.
18:45:41 <pnorman> There *is* software that will choke on <parent/>
18:45:54 <Zverik_h> level0 would ignore it. But since we have quite limited number of tags, it is possible that some software (e.g. osmosis) would throw an error
18:46:28 <pnorman> JOSM is fairly strict about extra attributes, not sure about extra elements
18:47:51 <zere> right, it entirely depends on how the parser has been written.
18:47:58 <Zverik_h> pity. So, the solution is to create 0.7 branch and introduce all kinds of new things there
18:48:16 <Zverik_h> e.g. area datatype :)
18:48:28 <zere> so, i'm looking at http://wiki.openstreetmap.org/wiki/User:Zverik/API_0.7_Proposal
18:49:00 <zere> and we might have trouble with "A call for deleted objects in bbox" - because doing this properly is hard
18:49:18 <Zverik_h> it's just headers for now, I intended to write some text for sections.
18:49:35 <zere> we can do it the same way that amf_controller does (i.e: not properly), if the only reason is to kill amf_controller. but it's not a great solution.
18:49:36 <Zverik_h> as for deleted objects, there is a call in amf controller — can't we just port it?
18:50:42 <zere> porting it directly is slightly difficult. if i remember correctly it builds the geometry for intermediate versions of the way, which has no equivalent in xml.
18:51:28 <zere> but there's no reason we couldn't port the core of it. it's just that it doesn't really solve the problem that it purports to solve, and it gets in the way of some improvements we wanted to make.
18:52:13 <Zverik_h> I think if we are to deploy OWL, this won't be as much an issue
18:52:18 <zere> basically, at the moment current_nodes includes deleted nodes. but there's no reason for it to do so, anything that queries current_nodes (except that one amf_controller call) excludes deleted nodes
18:52:55 <zere> removing deleted nodes from current_nodes would allow us to put a constraint on current_way_nodes to ensure that ways do not contain references to deleted nodes.
18:53:14 <zere> this hasn't (as far as i know) been a problem, but just gives extra safety
18:53:40 <zere> and overpass seems to be going in that direction too
18:54:31 <Zverik_h> overpass afaik cannot give deleted objects in an area, but can give a snapshot of an area at any given time
18:54:52 <zere> it can give a diff of an area between two points in time as well, can't it?
18:54:56 <Zverik_h> yes
18:58:47 <zere> i think we can work to a less ambitious 0.7 and push some stuff (e.g: areas) to 0.8.
18:59:29 <zere> have you seen the work-in-progress here? https://github.com/zerebubuth/openstreetmap-website/tree/json
19:00:22 <Zverik_h> zere: yes, but I didn't find any deployed versions, and osm api, I think, didn't return json
19:00:32 <Zverik_h> when I tried requesting it
19:00:47 <pnorman> well, it's a work in progress, not finished
19:00:55 <zere> the default is still XML, what was your Accept: headers?
19:01:59 <zere> i mean, i have tests for API methods which return JSON, but not all of them (e.g: https://github.com/zerebubuth/openstreetmap-website/blob/json/test/controllers/node_controller_test.rb#L519 )
19:04:12 <Zverik_h> well, I added section on JSON knowing you were working on it, so it won't be a major issue finishing it for 0.7
19:05:00 <pnorman> for 0.7 I see JSON for sure, and questionable for area
19:05:12 <pnorman> mind you, you could do json under 0.6
19:05:48 <zere> that was the plan. but if we're going to do 0.7, then it's a nice carrot to get people to upgrade ;-)
19:06:18 <Zverik_h> the reason it's for 0.7 is error return codes
19:06:38 <Zverik_h> it's proposed to provide human-readable error messages, not just http codes
19:06:47 <Zverik_h> human- and software-readable
19:07:04 <pnorman> There's not really a proposed 0.7 list. There's the wiki page, which inclues unicorns.
19:07:33 <Zverik_h> yes, I remember :) that's why I made my own, much shorter and without ponies
19:07:56 <zere> Zverik_h is talking about his cut-down list http://wiki.openstreetmap.org/wiki/User:Zverik/API_0.7_Proposal
19:07:59 <Zverik_h> I still read that page through and borrowed anything that won't require ton of programming
19:08:11 <zere> which, notably, doesn't include <parent/> ;-)
19:08:27 <Zverik_h> hmm, I forgot it :)
19:08:58 <Zverik_h> mind you, I've started my own page on area datatype :)
19:09:38 <zere> the most coding on that list is json output. the most coding still left TODO on that list is deleted objects. but perhaps when you add <parent/> it might be that ;-)
19:09:46 <Zverik_h> I think this could be done, but it is not as important as other small things
19:10:13 <zere> i think area is hugely important, we're just a long way from getting any sort of agreement on it.
19:10:14 <Zverik_h> zere: I thing <parent/> is just a couple additional words to existing queries
19:10:36 <Zverik_h> zere: I'll try to make some advancements with area soon
19:11:03 <Zverik_h> or at least revive the discussion and filter some existing proposals
19:11:18 <Zverik_h> (and try converting a region myself)
19:11:23 <zere> might be better to focus on this, and leave area until later...
19:11:28 <Zverik_h> yes, obviously
19:11:41 <zere> also, i think <parent/> is more complex than you think. but i'd love to be proved wrong! :-)
19:11:42 <pnorman> Having spent a lot of time on osm2pgsql area handling, I'd love an area datatype
19:12:10 <Zverik_h> zere: <parent/> should not include parents of parents, so it's just some queries to way_nodes and relation_members?
19:12:29 <zere> i think everyone would... the main point of disagreement seems to be validity checking.
19:12:34 <pnorman> Zverik_h: it's an extra join on output to join against current_way_nodes and current_relation_members.
19:13:01 <Zverik_h> pnorman: everybody would :) But introducing it would mean writing code for updating osm db, extracts, updating josm and id, and probably rails port
19:13:16 <pnorman> The problem is way_nodes is this massive table, probably with bloated indexes
19:13:23 <zere> Zverik_h: sure, but it's more than additional words to existing queries - it's making more queries.
19:13:42 <Zverik_h> pnorman: can this be solved somehow? Maybe split way_nodes, or invent some other way to store way-node relations?
19:13:56 <Zverik_h> since we are pushing it to 0.7
19:14:00 <pnorman> hrm ya, you'd have to do it like tag queries, a seperate query and iterate through results
19:14:01 <zere> i'm assuming it's bi-directional too?
19:14:05 <pnorman> Zverik_h: no.
19:14:17 <Zverik_h> zere: in which way? I think it isn't
19:14:58 <zere> like <node id=1><tags...><parent way=1/></node><way id=1><nd key=1/></way>
19:15:03 <pnorman> Zverik_h: I mean, you could partition on way id or some other arbitrary criteria, which isn't absolutely a bad idea because it makes db maint. easier when you have smaller tables
19:15:25 <Zverik_h> can we move db to simple features + tags on them + relations for api 0.8? :)
19:15:36 <pnorman> ugh no
19:15:49 <zere> i mean that <parent/> is inserted for all used nodes, ways and relations. (and of course, the ways and relations reference the things they use)
19:16:13 <Zverik_h> zere: no, parent objects are not included, only their types and ids in <parent/>
19:16:22 <pnorman> oh yikes, that's a query against way_nodes for each node in the output
19:16:37 <zere> not for map calls
19:16:38 <Zverik_h> else there would be a long chain of objects and very slow queries https://github.com/openstreetmap/openstreetmap-website/issues/768#issuecomment-46882212
19:16:43 <zere> well
19:17:03 <zere> we're *already* doing a query against way_nodes for all nodes within the original bbox
19:17:06 <pnorman> zere: then you have to store the results from the previous join to get the ways. not impossible, but more restructuring
19:17:16 <zere> so the additional query is probably not that large
19:17:55 <Zverik_h> for map calls, it would require adding another query against way_nodes for nodes outside bbox (which belong to ways)
19:18:01 <pnorman> I think I got estimates of 10-25% of the nodes are outside the bbox for larger map calls. of course, remember that the "average" map call is actually *tiny*
19:18:15 <pnorman> I mean what, 33% of them were map calls that  returned no data?
19:18:32 <Zverik_h> pnorman: *larger*? So it's much more for regular, tiny bboxes?
19:18:49 <Zverik_h> I think it's not entirely true :/
19:19:10 <Zverik_h> wait, I can test it right away with josm
19:19:25 <zere> Zverik_h: sorry, i didn't explain very well... i can see 2 ways of doing it - inserting <parent type=ID/> like a tag for every used element, *or* just inserting it where the parent element isn't in the same document.
19:19:28 <pnorman> My numbers are before iD had the usage it does, so they've probably changed a bit, but the distribution was not what I expected
19:19:37 <zere> the latter has the advantage that planet files would not change :-)
19:20:05 <pnorman> Zverik_h: on average, as bbox size shrinks relative to way length, percentage of nodes outside the bbox on a map call will increase
19:20:20 <Zverik_h> zere: the second case, only when parents are not in the dataset
19:20:21 <pnorman> zere: if we had <parents/> we could skip marking ways as pending when they're not used in a relation
19:22:28 <zere> yeah... i hadn't even considered changing the diffs... that's a very interesting question.
19:22:44 <pnorman> Oh, I was thinking of the planet not diffs
19:23:08 <Zverik_h> I've got 10.6% for 45k nodes outside St. Petersburg
19:23:20 <pnorman> Zverik_h: most calls aren't 45k nodes though
19:23:53 <zere> okay, this is very interesting, and i think we should come back to it next meeting. otherwise we might be here all day :-)
19:24:01 <Zverik_h> yes. So this area (1.2MB download) would require querying way_nodes for 4.2K nodes more
19:24:02 <zere> or all night, depending where you are...
19:24:14 * pnorman is just watching an xapi db load...
19:24:19 <Zverik_h> yes, I think we should end this discussion :)
19:24:29 <pnorman> Zverik_h: the thing is, the load from map calls isn't 50k node calls, it's lots of little 100 node calls
19:24:33 <Zverik_h> so, there should be som
19:24:35 <Zverik_h> oops
19:25:00 <pnorman> At this point I think we should just think it over and come back to it
19:25:05 <Zverik_h> yes
19:25:21 <zere> okay, we'll pick it up next time. thanks to everyone for coming, and hope to see you at the next meeting :-)