Working Group Minutes/EWG 2013-12-16
Appearance
Attendees
| IRC nick | Real name |
|---|---|
| apmon | Kai Krueger |
| gravitystorm | Andy Allan |
| pnorman | Paul Norman |
| shaunmcdonald | Shaun McDonald |
| zere | Matt Amos |
Summary
- osm2pgsql threading
- pnorman gave an update on a fix to the threading branch for issues with tag corruption. It is not thought to affect benchmark results.
- pnorman noted: FYI to anyone running a rendering server with updates, I suggest reindexing planet_osm_ways, we found out that there's an index that comes out of the import at 3.5GB, and after reindexing goes to 8kb
- there was a discussion of imposm3 and interest in benchmarking it against osm2pgsql to understand the performance / feature trade-offs.
- xmas holidays
- As many will be busy over the holiday period, the next meeting will be on the 6th Jan 2014.
IRC Log
17:31:39 <zere> hello everyone, last meeting minutes at http://www.osmfoundation.org/wiki/Working_Group_Minutes/EWG_2013-12-09 and please let me know if there's anything in them which needs to be changed. 17:32:22 <zere> actions are probably the same as previous meetings. once again, i have to apologise for not getting the blog post done - it's still on my TODO. gravitystorm? 17:32:35 <gravitystorm> nothing :-( 17:33:09 <zere> i will note that, if anyone wants to help, then i'm sure both gravitystorm and i would be very happy to receive it. 17:34:35 <zere> pnorman: worth talking about that osm2pgsql-thread-tagging issue? or is it all under control? 17:34:51 <pnorman> worth updating 17:35:04 <zere> #topic osm2pgsql threading 17:35:15 <pnorman> apmon has made changes that should fix it, but I haven't run it through any tests. 17:35:42 <pnorman> I'm running a test to see how long CLUSTER stays effective, so that takes some time to run 17:35:43 <zere> do you know if the changes are likely to alter the benchmark results? 17:36:56 <pnorman> apmon would know better than I. my guess is that much of the time is spent getting data to/from postgres and it won't change that 17:40:33 <pnorman> I'm not sure what numbers best indicate what is happening as performance degrades over time. I'm collecting pgstattuple info (http://www.postgresql.org/docs/9.1/static/pgstattuple.html#PGSTATTUPLE-COLUMNS) as well as correlation data and performance data, but it's really a case of information overload, with about 150 numbers collected 17:44:27 <pnorman> I'm 09:35 < zere> do you know if the changes are likely to alter the benchmark results? 17:47:26 <pnorman> apmon: ^ 17:47:42 <apmon> which changes? 17:47:54 <pnorman> tag processing threading changes 17:48:20 <apmon> probably not. Altough if some of the times you were using de-duplication, it might 17:49:05 <apmon> My guess would be due to it being uninitialised, de-duplication was by default off, which is the correct thing 17:49:42 <pnorman> well, something has changed - or else it's still got the thread unsafe part :) 17:54:17 <pnorman> as an FYI to anyone running a rendering server with updates, I suggest reindexing planet_osm_ways, we found out that there's an index that comes out of the import at 3.5GB, and after reindexing goes to 8kb 17:56:07 <zere> intuitively, i'm thinking that cluster should mean that when postgres lifts a page off disk, more of it should be relevant. i don't see anything about the pages loaded to tuples loaded ratio on that pgstattuple page, though 17:56:22 <shaunmcdonald> pnorman: could the osm2pgsql import auto redo that index as part of the import? Thus saving anyone who is setting up a new DB to have to remember to do that step. 17:57:06 <pnorman> shaunmcdonald: it's been patched to not build that index until the end, but that doesn't change it for anyone who's already imported 17:57:20 <apmon> shaunmcdonald: On my todo list 17:57:27 <shaunmcdonald> :-) 17:57:33 <pnorman> zere: there's three things going on: degredation of cluster, table bloat, and index bloat 17:57:42 <apmon> I intend to move the index creation to after the going over pending ways stage 17:58:03 <apmon> at which point there will be 0 pending ways instead of tens of millions of ways 17:58:15 <apmon> i.e. an index size of 8kb instead of 4GB... 17:59:37 <pnorman> I'm also going to run imposm3 through an import to see how it performs on the same hardware 18:00:09 <apmon> should be interesting 18:00:19 <zere> how easy is it to get an apples-to-apples comparison, though? 18:01:12 <pnorman> you need a .style and an imposm3 mapping file that are equivalent. 18:01:14 <apmon> Even an apples to oranges comparison will give some indication of how well either works 18:02:03 <pnorman> yes - we have no direct compairson of any sort right now 18:02:08 <apmon> I.e. it gives some bounds on what is possible and if one is vastly more efficient than the other 18:06:48 <zere> right, so imposm3 is a strict feature superset of osm2pgsql, then? 18:07:29 <zere> if one can write an imposm3 mapping file equivalent to a .style file 18:08:17 <apmon> it doesn't have diff imports? Or did they implement that by now 18:08:27 <zere> because i know that imposm supports a bunch of generalisation features. just not sure if it has features which map 1:1 with what osm2pgsql has. 18:08:48 <pnorman> imposm3 has diff imports 18:10:40 <zere> well, it says "Imposm 3 is much faster than Imposm 2 and osm2pgsql" -- so it would be interesting to see under what conditions that's true. 18:12:37 <zere> "Other missing features: ... Updating generalized tables in diff-mode ... Diff import into custom PG schemas". so, looks like there's some (imho, pretty major) short-comings with imposm3's diff support. 18:13:31 <pnorman> don't think osm2pgsql supports PG schemas either 18:14:31 <zere> you think that means PG schemas? i thought it meant table schemas. 18:14:52 <zere> well, i should say s/PG schemas/namespaces/. 18:15:02 <pnorman> well it does say pg schemas 18:16:28 <pnorman> when the imposm3 docs talk about schemas, they seem to mean pg schemas consistently 18:17:01 <zere> apart from "Custom database schemas: Creates tables for different data types. This allows easier styling and better performance for rendering in WMS or tile services." where it clearly means table schemas. 18:17:51 <zere> i guess you'll find out if you try ;-) 18:21:54 <zere> was there anything else anyone wanted to discuss? 18:24:56 <gravitystorm> not from me 18:29:57 <zere> ok. 18:30:01 <zere> #topic xmas 18:30:26 <zere> the next meeting would be the 23rd... and i'm guessing a lot of us will be busy 18:30:45 <zere> the one after would be the 30th - probably another busy date. 18:31:12 <zere> would anyone like to have meetings on these days, or shall we just say the next one is the 6th January 2014? 18:31:29 <pnorman> maybe we can let meetbot run by itself, it seems to generate plenty :) 18:34:31 <gravitystorm> 6th works for me 18:35:10 <zere> well, meetbot will be here... so even if i'm not and you want to have a meeting then please use it. it shouldn't need any special setup. 18:38:12 <zere> thanks everyone! and happy holidays :-)