Working Group Minutes/EWG 2013-04-29
Appearance
Attendees
| IRC nick | Real name |
|---|---|
| apmon | Kai Krueger |
| Firefishy | Grant Slater |
| gravitystorm | Andy Allan |
| pnorman | Paul Norman |
| TomH | Tom Hughes |
| zere | Matt Amos |
Summary
- Carto style
- pnorman has been trying some benchmarking on the OSM dev server, but there's too much variation in the results to get a meaningful baseline.
- apmon cited a previous 30% slowdown result, but dating from 3rd Feb so possibly missing improvements made to the style in March.
- Attempts were made to find a suitable benchmarking machine, but OSMF does not have one spare in its inventory. pnorman is looking into alternatives.
- READMEs
- ACTION gravitystorm to have a go at making the rails_port README better.
IRC Log
17:01:07 <zere> welcome. minutes of the last meeting: http://www.osmfoundation.org/wiki/Working_Group_Minutes/EWG_2013-04-22 17:01:29 <zere> let me know if there's anything which needs correcting. 17:01:43 <zere> on the agenda today: carto style & documentation. 17:01:51 <zere> #topic carto style 17:02:42 <zere> pnorman__ sent me an email earlier about some experiments he's been doing with renderd on errol. 17:03:01 <zere> apparently the standard deviation on a 1 hour test is 1 hour. 17:03:45 <zere> so it does appear that errol's variation in disk use makes benchmarking on there particularly difficult. 17:03:50 <apmon> the load on errol seems very variable 17:03:53 * gravitystorm digs through my email accounts 17:04:53 <zere> pnorman__ says he's going to try on EC2, which should give more stable results - even if the disk latency on EC2 tends to be rather terrible 17:06:26 <apmon> Are there no other test systems available? 17:06:51 <gravitystorm> zere: ah, found the email, on the tileserving list. Can I backtrack and ask what the context is - is it just a comparison of xml vs carto styles, or something more precise? 17:07:19 <apmon> afaik just a simple comparison for now 17:07:34 <apmon> but on a typical work load derived from yevauds logs 17:08:00 <gravitystorm> If the former, then I'd suggest just doing benchmarks on a laptop or regular PC - it has a more typical CPU/disk/memory balance than EC2, albeit much less than on a tileserver 17:08:44 * pnorman checks in 17:08:52 <apmon> If you run it on a typical workload, you do need a full planet loaded though 17:08:54 <gravitystorm> EC2 is somewhat skewed towards memory/cpu 17:09:56 <zere> indeed. but i think that would tend to exaggerate any difference rather than hide it? 17:10:18 <pnorman> I was attempting to do a compairson, but haven't run carto yet as I doubt I'll get anything statistically significant on errol 17:10:19 <zere> not that i really know - kind of the point of benchmarking is to figure this stuff out. 17:10:22 <gravitystorm> apmon: true, but I'm sure one could either filter the metatiles to only a particular region, or else take a less rigourous approach (e.g. ignore <z6 since there are so few of them, and only really worry about the differences based on the proportion of tiles served) 17:10:46 <pnorman> zere: not if the difference is in queries (disk seeks) 17:11:13 <pnorman> gravitystorm: the list of metas is from a log excerpt from yevaud, so the balance of low-zoom is the same as it is for yevaud 17:11:23 <apmon> zere: If I am not mistaken, the carto style also needs quite a bit more CPU 17:11:45 <zere> i'm expecting carto to have more queries, more disk seeks and therefore comparatively suck more on EC2... is that not right? 17:11:56 <apmon> I got the 30% difference figure, when rendering just Karlsruhe on a germany extract, which probably fit into memory cache 17:12:10 <gravitystorm> pnorman: I was aiming to ignore lowzoom mainly because they a) take forever b) aren't very many of them and c) screw up testing if you're only using a regional extract :-) 17:12:24 <pnorman> i mean, really, the only way to be sure what resources it needs is to stick on a machine equivalent to yevaud and load it up equivalent to yevaud 17:13:05 <gravitystorm> zere: I'm expecting the same number of queries to be honest, the main difference would be around the filter-first performance versus match-all approach in the xml symoblizers 17:13:19 <gravitystorm> pnorman: sure. 17:13:57 <pnorman> so, does anyone have alternative suggestions to EC2? my benchmarking requires a full planet osm2pgsql db, renderd+mapnik 17:14:00 <apmon> As it still seems that yevaud is more CPU bound the disk bound, even with its single outdated SSD, that might be rather relevant 17:14:16 <zere> i think we're only looking to figure out some rough number. basically whether putting it on yevaud, in its current state, is going to kill it. 17:15:14 <apmon> On the otherhand, if it is CPU bound, there probably are some spare CPUs in some of the not used servers one could use. 17:15:43 <TomH> is that even the plan though, or is the plan to put it on orm (with SSDs) 17:15:48 <TomH> I've kind of lost track.... 17:16:04 <zere> well, Firefishy has his plan, but i think his plan is crazy ;-) 17:16:12 <pnorman> is there a spare machine which could be used to benchmark which has a better cpu/memory/disk balance for testing? 17:16:36 * Firefishy catches up. 17:16:58 <gravitystorm> and if we upgrade mapnik at the same time, the performance increase (or decrease) might swamp any stylesheet-related ones :-) 17:17:53 <apmon> What is Firefishy's crazy plan?... ;_0 17:18:40 <pnorman> gravitystorm: true - doing a DB reload, OS upgrade, postgresql upgrade, postgis upgrade, mapnik upgrade and stylesheet change all at the same time could drastically change the requirements 17:18:58 <apmon> gravitystorm: I haven't redone the benchmarks recently, but a 30% increase in rendering time (if it is still the case) would likely be significant 17:19:05 <apmon> even when moving over to orm 17:20:48 <gravitystorm> apmon: Do you have a version number or git commit from that 30% figure? 17:22:20 <zere> pnorman: i just had a quick look, and i don't see anything spare with enough disk to handle a rendering database. 17:22:23 <apmon> let me check what date I posted that comment on github 17:22:34 <apmon> as it was a fresh checkout from that day 17:22:37 <pnorman> zere: even with slim tables dropped? 17:23:14 <gravitystorm> if it was prior to v2.2.0 (March 13) then things might have changed - that was moving away from attachments for the road layers 17:23:14 <zere> pnorman: good point - how big is it with slim tables dropped? 17:23:18 <pnorman> zere: although I guess you need the space for slim tables before you drop them... 17:23:20 <apmon> https://github.com/gravitystorm/openstreetmap-carto/issues/20#issuecomment-13058442 so it was three months ago 17:23:50 <apmon> 2013-02-03, so yes it was 17:25:18 <apmon> I should probably try another equivalent small scale benchmark on my laptop 17:25:48 <apmon> but I am having trouble getting carto installed, as node.js dependencies are preventing me from getting carto running 17:25:50 <Firefishy> "my plan" is to add SSD to orm, make it primary renderer... then reinstall yevaud to make it additional renderer. Both "full master". Idris was used by TomH to test/create chef scripts for setting up openstreetmap-carto 17:26:15 <gravitystorm> apmon: awesome. Another benchmark would be really useful - if you do so, then per-zoom breakdowns are really useful (it's always z12-16 that are the ones to focus on) 17:26:43 <apmon> render_list now does spit out per zoom level info, so that should be easy 17:26:54 <pnorman> zere: 71GB for a full planet from a couple weeks ago 17:26:54 <gravitystorm> apmon: also, side-by-side comparisons from the output of the mapnik debug logs gives an idea of which layers/queries are worth focussing on. 17:27:05 <zere> yeah, i was looking at idris, but it only has 64GB disk. 17:27:07 <apmon> although for the simple benchmarks, one can simply render each zoom separately. 17:27:14 <zere> Firefishy: what state is azure in? 17:27:50 <apmon> It simply reflects the state of that everything is likely in cache if you render a bbox systematically 17:28:04 <Firefishy> zere: If could lift it... it would be out the window. It is x86 and sh*t. 17:28:12 <zere> and, in the third conversation thread, why i think Firefishy's plan is crazy is the "full master" bit with no shared cache. 17:28:41 <pnorman> if azure is racked and working it should easily handle it, osm2pgsql is less demanding than pgsnapshot 17:28:55 * apmon agrees and would like to see a more distributed approach 17:29:02 <gravitystorm> apmon: I wouldn't worry too much about caching, since we're looking more at a CPU-bound setup on the tileservers anyway 17:29:06 <zere> i know it's shit, but it has 24GB RAM and decent sized disks. if the 32-bitness isn't going to massively impact the benchmarks, it might be appropriate. 17:29:32 <zere> and then i'll help you "lose" it in a nearby rubbish bin ;-) 17:29:33 <pnorman> oh, 32 bit? ugh... does that even work anymore with nodes over 2^31? 17:29:56 <apmon> It might make it difficult to import the full planet, as osm2pgsql doesn't like 32 bit very much 17:30:06 <Firefishy> zere: not going to happen. azure is missing disks (I've cannibalised it already). I can get disks for idris rather. 17:30:07 <zere> slim mode should? x86 does support uint64_t, just rather slowly 17:30:08 <apmon> pnorman: It should still work, as the node ids will still be in 64 bit 17:30:29 <apmon> but you will pretty much have no node cache during initial import 17:30:35 <zere> Firefishy: cool. maybe that's the way forward then. 17:31:53 <pnorman> looking at http://www.ec2instances.info/ (ec2 list with sane formatting) might a high-memory double extra large with provisioned EBS storage work? 17:32:12 <Firefishy> I'll try get the disks sorted by later this week. 17:34:03 <pnorman> might it be more effective to devote sysadmin time to getting a two-server render setup going? we know we'll need to move to it eventually, what we're not certain on is if switching to carto will kill yevaud if we switch before then 17:34:10 <apmon> pnorman: As yoy will probably need it for more than 48 hours, you are nearly better off, just getting a server at hetzner or ovh for a month 17:34:32 <zere> pnorman: it might do. i'd still be worried that amazon's idea of "high" i/o performance corresponds to http://upload.wikimedia.org/wikipedia/commons/b/b4/IBM_350_RAMAC.jpg 17:35:40 <pnorman> apmon: hetzner has a setup fee, total is 100 euros for 1 month 17:35:53 <apmon> ovh, I think doesn't 17:36:03 <pnorman> zere: I hear with provisioned EBS you can actually get decent iops performance 17:36:54 <zere> that would be very interesting 17:37:15 <pnorman> apmon: how little disk space do you think you can get away with for a system? ovh has a SSD setup with 2x120GB 17:38:13 <pnorman> zere: a big part of me wants to learn how to use EC2, so if the costs are equivalent I wouldn't mind using EC2 17:38:44 <apmon> My guess would be it is slightly above that. Although with --flatnodes and --drop, you might get it close 17:39:51 <pnorman> I know you're below 120GB *after* import, but during import is my concern 17:39:52 <zere> slightly flippant suggestion, but you could try an older planet? 17:40:25 <zere> i know it won't be the same, but it'll be smaller and we're only interested in comparative results 17:41:26 <pnorman> should work. we'll only get comparative anyways. slightly different data density distributions might impact it, but that shouldn't be too bad. I was looking at http://www.ovh.ie/dedicated_servers/sp_32g_ssd.xml 17:42:10 <zere> e.g: the 05-Jan-2012 planet is 30% smaller than the most recent. but your results would be cc-by-sa :-P 17:44:49 <zere> ok, i reckon there's a few ways forward here, and i look forward to seeing what the results are. i'd also like to talk about documentation / READMEs / etc... 17:44:53 <zere> #topc READMEs 17:44:59 <zere> #topic READMEs 17:45:31 <zere> open season on ideas about what needs to be improved with the current rails_port README 17:45:46 <pnorman> carto install instructions suck. also, I couldn't get carto running on errol 17:45:46 <pnorman> oh, different readmes 17:46:04 <TomH> pnorman: oh I have that all working in my tile server cookbook 17:46:07 <zere> worth making a note of anyway. we'll want to get around to that eventually 17:46:42 <gravitystorm> zere: so I read over the rails port readme earlier. Seems like the second half could be split into a CONTRIBUTING.md or similar 17:47:30 <zere> all the bits about coding style, testing, etc... yup. if that's a reasonably standard place for them 17:47:30 <gravitystorm> Also the rails port readme needs more focus on installation. Unfortunately the installation instructions (on the wiki) are way, way to verbose 17:47:33 <apmon> pnorman: If you have a out-to-date compiled carto style, could you send it to me? 17:47:41 <apmon> up-to-date 17:48:15 <zere> gravitystorm: yeah, we touched on that last time. my opinion is that it needs to have something, but no more than a few lines and a link to "here's (way) more information (than you wanted)" 17:48:40 <apmon> Then I can quickly run the benchmark comparison on the germany extract as an initial indication of if things have improved 17:48:44 <pnorman> apmon: mine is a couple weeks old and the shapefile paths are hard-baked in, so you'd have to edit them unless your username is pnorman 17:48:56 <gravitystorm> for example, it lists bundler as dependencies, then someone else has added a (platform specific) 'how to install bundler' further down the page. There's even stuff there about what error messages appear when rubygems.org is having a hiccup 17:49:52 <zere> so... anyone fancy having a crack at improving the README? 17:50:10 <apmon> pnorman: I had to change db names and everything anyway. find and replace in a text editor is your friend... 17:50:20 <pnorman> apmon: http://pnorman.dev.openstreetmap.org/osm-carto.xml 17:50:34 <apmon> thanks 17:50:38 <gravitystorm> zere: well, another approach would be to copy-paste the wiki into INSTALL, then start viciously removing stuff until it's a bit more sane. But that's only worth doing if there's a scorched-earth approach to the wiki pages 17:51:04 * pnorman always welcomes scorched-earth and the wiki 17:53:03 <zere> gravitystorm: i was thinking more than the README install docs could be "you've got ruby installed, right? if not, see http://www.ruby-lang.org/en/downloads/. now `gem install bundler; bundler install; rake db:migrate; rake test` 17:53:29 <zere> ... if you have any problems, see TROUBLESHOOTING.md" 17:53:40 <zere> which then points to the wiki for the gory details 17:54:05 <zere> if people are like me, then they tend to google the error message anyway, in which case they'll land on the wiki. 17:54:14 <gravitystorm> zere: well, that's where I would like to head with the docs. Unfortunately the rails port is full of stuff (like db functions) that need a bit more explanation. 17:54:16 <apmon> you still need a couple of more commands to set up the postgresql db and applications.yml 17:54:36 <apmon> but otherwise, yes, there really isn't much to setting up rails_port 17:54:51 <pnorman> well, loading the damn database takes a month 17:55:06 <gravitystorm> zere: well, I'd like to move away from using the wiki for technical documentation, since we've got the well-observed emergent behaviour of "shit documentation" when we rely on it. 17:55:45 <pnorman> I don't use the wiki for my projects documentation, and I insist that any pull requests adding features add docs at the same time 17:56:22 <gravitystorm> Do we have a target platform for the installation notes? Is it reasonable to assume Ubuntu is the normal base, and other platforms are exceptions? 17:56:50 <zere> gravitystorm: i think we only rely on bizzaro db functions for the changes stuff, which should be removed anyway. with PG 9.2 extensions, the rest (btree_gist) is pretty easy 17:57:46 <pnorman> ubuntu is the normal base. i found out that freebsd ports actually has far more recent versions of many GIS components than ubuntu's repos 18:00:21 <zere> cool. anyone (gravitystorm?) feel like volunteering to have a go at it? 18:00:33 <gravitystorm> sure, I'm happy to have a go 18:00:56 <zere> #action gravitystorm have a go at making the rails_port README better 18:00:58 <zere> thanks! 18:01:25 <zere> we're at the top of our hour, but does anyone else have anything they'd like to discuss? 18:01:35 <gravitystorm> Is there any consensus about moving all of http://wiki.openstreetmap.org/wiki/Rails_port into the git repo? I'm in favour, but I won't do it if it's a bad idea 18:01:42 <pnorman> +1 to that 18:02:26 <pnorman> could we get carto on errol? 18:02:54 <zere> gravitystorm: only if it's heavily edited? 18:03:31 <gravitystorm> zere: certainly 18:04:23 <apmon> Might be something we could put on the agenda for next week, but having another go at figuring out how to translate help.osm.org would be good 18:05:26 <zere> yup. i'll stick that up for next week. 18:07:22 <pnorman> I'm going to try importing with osm2pgsql and a M3 2xlarge EC2, I'll see how that goes 18:07:32 <zere> awesome :-) 18:07:59 <pnorman> my benchmarks are likely to be carto with latest software vs osm.xml with latest software. I wonder what impact having the latest software is 18:08:43 <zere> yeah, one would hope that it's as fast or faster than the older software, but it isn't always the case. 18:09:06 <zere> i guess that's a wrap. thanks to everyone for coming, and i hope to see you next week! :-)