Exegesis Spatial Data Management

 

Some notes about setting up ​Tilecache for the Ordnance S​urvey Open Data

Background

Increasingly we are using Ordnance Survey Open Data as a background map option in web sites, and also in desktop systems where no other mapping is available. We have so far served this as a MapServer WMS. This puts a significant load on the web server, and will not scale beyond a few concurrent users. Making this scale requires converting the datasets into a cache of image tiles that can be accessed without running a GIS engine on the server. The most widely used tool for this is Tilecache from Metacarta http://tilecache.org

Usually OS mapping is used in a map in OSGB coordinate system, however my first two applications requiring faster mapping are a) the Angling Diary, and b) Ramblers Routes, both of which work in the spherical Mercator projection (EPSG:900913). These make it even more imperative to cache the data, as there is on-the-fly reprojection involved in using them as a WMS directly. Re-projection can also make the maps look bad, mitigated only by finding the best resampling settings (see below about MapServer resampling).

Information

http://tilecache.org/docs/README.html

which gives general instructions (but misleading about Windows), and this blog post which explains in more detail how to set up tilecache on IIS (but relates to IIS5 & Server 2000):

http://viswaug.wordpress.com/2008/02/03/setting-up-tilecache-on-iis/

Python

Tilecache requires Python. Check whether the server is 32/64-bit and get the appropriate download from here: http://www.python.org/download/

For our web7 server I am using the x64 version: python-2.7.2.amd64.msi

I accepted all defaults, and installed to here: C:\Python27\

This path needs to be added to the PATH environment variable. I have a feeling this does not take effect until a restart, but I'm not 100% sure.

Python Imaging Library

Then we also need the Python Imaging Library, from here http://www.pythonware.com/products/pil/

Again, select the correct version for the version of Python used above, though there seems to be no 64-bit version. I used file PIL-1.1.7.win32-py2.7.exe

Running this installer failed because it said Python was not installed - missing registry settings. A reboot did not fix this.

A bit of Binging leads to this page http://www.lfd.uci.edu/~gohlke/pythonlibs/ where we can find a 64-bit installer:

PIL-1.1.7.win-amd64-py2.7.exe

This installed fine.

Tilecache

Download Tilecache from here http://tilecache.org/

This gives us tilecache-2.11.tar.gz

I used 7-zip to extract from the gz to a folder that contains the .tar file. Then I extracted the files from .tar to end up with a folder: \tilecache-2.11

I don't think the "PaxHeader" folder parallel to this folder is needed, but I hung onto it anyway, and put both folders under a single Tilecache folder.

This folder has to be located somewhere CGI scripts can run, I put it below a Mapserver "scripts" folder that was already operational within a web site. Depending on where you put it, it may be necessary to create a virtual directory for Tilecache.

IIS stuff (the miserable bit)

Now we need to setup IIS to run Python scripts.

Web site > Handler mappings > Add Script Map

Request Path = *.py

Executable = "C:\Python27\python.exe" %s %s

Name = Python27 (or whatever you like)

I did not change anything in Request Restrictions.

 

I'm not yet sure whether this is strictly needed, but at the root level in IIS I also added the same path under "ISAPI and CGI Restrictions" > Add >

ISAPI or CGI path = "C:\Python27\python.exe" %s %s

Description = Python27

Tick to allow the path to execute.

 

Vish's article describes another step...

Open up the command prompt and change directory to ‘C:\Inetpub\AdminScripts’. Execute the following:

adsutil set w3svc/AllowPathInfoForScriptMappings True

adsutil set w3svc/1/AllowPathInfoForScriptMappings True

However on our server there was no AdminScripts folder in C:\Inetpub. Therefore it was necessary to install IIS6 Script Services, like this:

Server manager > Roles > Web Server > Add Role Services > Management Tools > IIS 6 Scripting Tools (which in turn requires adding others which it selects for you automatically).

Once this had been done, I was able to run the two command lines above. While in the role services area, check that CGI is enabled in IIS as well, because none of this will work without CGI; but if MapServer is working, then CGI must already be enabled.

Permissions

I initially gave the Internet Guest Account (IIS_IUSRS) modify permissions on the "Cache" folder. However I took these permissions off again, and it still worked. Vish said this was required, but it cannot be really, as it is not the web site user creating the tiles, it is the python process.

Tilecache itself

Rename tilecache.cgi to tilecache.py

Edit tilecache.py and remove the first line in it that reads '#!/usr/bin/env python'. Also, change the 'Service.Load'’ parameter to point at the correct path to tilecache.cfg (and be sure to use double back-slashes in the path).

Tilecache includes a web page with an OpenLayers map, that serves to check whether things are configured correctly, and also allows you to manually start caching tiles. This is index.html, which by default loads and caches the OpenLayers base map WMS. I copy this file to e.g. indexOS.html then edit as required.

This page requests maps from tilecache.py which in turn uses tilecache.cfg configuration. So to work with a different data source it is necessary to add the relevant configuration to both files.

Tilecache.cfg

Configure the type and location of the cache. Here we are using a local file cache on disk:

[cache]
type=Disk
base=D:\mypath\tilecache\tilecache-2.11\Cache

Configure the layer you want to cache, in this case our OS Open Data WMS. These settings were arrived at after much blood and sweat.

[OSOpenSphMerc]
type=WMS
layers=OSOpenData
url=http://mywebsite/scripts/mapserv.exe?map=D:\Websites\UKBaseMap\map\UKBaseMap.map
extension=png
extent_type=loose
srs=EPSG:900913
# this definitely required when calling in 900913
spherical_mercator=true
bbox=-20037508.34,-20037508.34,20037508.34,20037508.34
resolutions=78271.51695,39135.758475,19567.8792375,9783.93961875,4891.969809375,2445.9849046875,1222.99245234375,611.496226171875,305.7481130859375,152.87405654296876,76.43702827148438,38.21851413574219,19.109257067871095,9.554628533935547,4.777314266967774,2.388657133483887,1.1943285667419434,0.5971642833709717,0.29858214168548586

The "spherical_mercator=true" setting is supposed to remove the need for a resolutions setting, but in practice I got errors if it was not there. The settings above are basically the entire world in spherical mercator.

My matching OpenLayers code was:

        function init(){
            map = new OpenLayers.Map( $('map'), {
                projection: new OpenLayers.Projection("EPSG:900913"),
       	        units: "m",
               maxExtent: new OpenLayers.Bounds(-20037508.34, -20037508.34, 20037508.34, 20037508.34),
               resolutions: [78271.51695, 39135.758475, 19567.8792375, 9783.93961875, 4891.969809375, 2445.9849046875, 1222.99245234375, 611.496226171875, 305.7481130859375, 152.87405654296876, 76.43702827148438, 38.21851413574219, 19.109257067871095, 9.554628533935547, 4.777314266967774, 2.388657133483887, 1.1943285667419434, 0.5971642833709717, 0.29858214168548586],
               controls: [new OpenLayers.Control.Navigation(),
                                new OpenLayers.Control.PanZoomBar()],
                  }
            );
            OSOpenSphMerc = new OpenLayers.Layer.WMS( "OSOpenSphMerc",
                    "tilecache.py?", {layers: 'OSOpenSphMerc', format: 'image/png' },
                    {isBaseLayer: true}
            );

However, when browsing the map I kept getting errors on some tiles, along these lines:

"An error occurred: Current y value 7983694.728100 is too far from tile corner y 7944558.969625"

This problem got worse the further I zoomed in, and the further north in the UK. However, it was intermittent in the sense that one band of tiles might draw (north-south or east-west bands) then the next might not - a checkerboard effect. The exact same WMS requests going directly to the OSOpenData WMS worked fine - i.e. the BBOXes in the requests were good.

Seeding the cache

Therefore I tried seeding the cache directly using a command-line like:

D:\Websites\UKBaseMap\scripts\tilecache\tilecache-2.11\tilecache_seed.py -f --bbox=-1060000,6405978,242016,8700250 OSOpenSphMerc 1 11

(this bounding box being a rather unscientific box that includes the OS mapping and is divisible by 256 in both directions - whether the latter is important or not I do not know).

This seeding worked without any tile failures. Therefore, after much searching the web and head-scratching, I have concluded that the errors when used from OpenLayers may be down to Tilecache bugs.

Note about seeding: Level 0 raised an error about a zero length image and would not complete seeding, so I had to skip this and start at 1. Level 0 equates to looking at the earth from a long way away, so no big deal.

Another note about seeding: each run tended to produce a few errors. There were a lot of "Cache miss" entries coming back in the command window, plus some more serious errors occasionally (HTTP 502 Bad Gateway - which causes the seeding operation to bail out). Therefore I tended to run each level twice (or more if it had bailed out). The first time in, I used the "-f" flag to force re-creation of all tiles (in case I had any left over from testing and setting up resolutions). The second time I omitted this flag, so it would only re-create any missing tiles. I don't really know whether this was necessary, except in the few cases where caching a layer totally aborted.

Managing the cache

I'm now part way through seeding the cache - on level 16. The size of the cache grows exponentially with each level, so disk space may become an issue. With a small part of level 16 done (this being StreetView) we have over 3 million files and 20GB of space used.

There is a cache cleaning utility in Tilecache, which removes the tiles accessed least recently in order to reduce the cache to a specifified size. However, this would only be beneficial where all map requests are going through tilecache.py, so that tiles can be recreated where needed from the datasource. Maximum benefit is gained by pre-caching the entire dataset, and accessing it as a tile service, therefore no cleaning is possible.

Caching in GB National Grid

Once I was happy that the 900913 cache was building OK, I turned attention to a GB National Grid cache, which should in theory be much simpler.

The bounding box is 0,0,700000,1300000 though out respect for the many people on the web who have said it should be a multiple of 256, I altered this slightly (in OpenLayers map and tilecache.cfg) to 0,0,699904,1299968. Actually later I found this was missing edge tiles at large scales, and I then found the BBOX when seeding had no effect on the tiling behaviour - it simply governed the area of seeding; therefore I changed this to something like 0,0,800000,1400000 at large scales, and back to 0,0,700000,1300000 for lower ones.

But what about the resolutions? One technique I have found is to set the OpenLayers map to

maxResolution: 'auto', numZoomLevels: <some sensible number like 14>

Then try to start caching, and you quickly get errors back associated with the dreaded pink boxes in the map (you see the errors in Fiddler), which says that the required resolution was not found, and gives an array of available resolutions (I don't know the basis for the values it gives). These can then be used in the cfg and map settings.

This worked pretty well for the OS data, except that after 1:14000 it gave 1:7029, which shows StreetView rather too zoomed out and looking rubbish. Also, scales like 1:1757 are not user friendly, and although on-screen scale is not entirely meaningful, users still prefer a round scale like 1:2500 (in cases where scale is displayed). So perhaps we need to define our own array of scales, and reverse engineer an array of resolutions from them. How?

OpenLayers has a control that can shows the map scale (map.addControl(new OpenLayers.Control.Scale());), but does not have one to show the resolution (AFAIK). So we have to do this ourselves, as follows. Add a handler for the move event onto the (uncached) WMS layer:

  OSOpenDirect.events.on({
                 moveend: function(e) {
                     if (e.zoomChanged) {
                       showResolution();
                     }
                   }
        });
        }

which calls this function:

        function showResolution() {
            document.getElementById("res").innerHTML = map.getResolution();
        }

which requires a div like this on the page:

<div id="res">the resolution will be shown here</div>

Then simply set your map scales array to whatever you fancy, and the corresponding resolution will be shown.

On doing this however, we quickly find that the VectorMap and StreetMap datasets only look decent at a very confined range of scales. In the end I abandoned the quest for user friendly scales, because VectorMap only looks decent when unscaled, i.e. giving a map scale of 1:7087.

I ended up with this resolutions array, which prioritizes nice-looking maps over friendly scales:

resolutions=3000,2000,1000,500,250,150,100,50,25,12.5,5,2.5,1

Of course the maps you get also depend on how the layers are set up in the WMS, i.e. what scale thresholds are set for each layer. I had to tweak ours a bit.

Aside - re-sampling in MapServer

The smaller scale OS maps do not mind being scaled if there is good resampling being done at the MapServer end - in fact this is crucial for making the maps look decent, and the performance hit doesn't matter once the data is cached. I've achieved this with this directive on each layer:

PROCESSING "RESAMPLE=BILINEAR"

along with this output format:

OUTPUTFORMAT
NAME png
DRIVER "AGG/PNG"
MIMETYPE "image/png"
IMAGEMODE RGBA
EXTENSION "png"
TRANSPARENT ON
# these setting greatly reduce the size of the PNG image
FORMATOPTION "QUANTIZE_FORCE=on"
FORMATOPTION "QUANTIZE_COLORS=256"
END

An example of how much worse the maps look without this can be seen here:

http://andrewl.net/map/ordnance-survey-rasters-mapserver-tilecache

(which is otherwise another helpful resource for anyone using tilecache and OS OpenData on a Unix platform).

How long does it take, and how much disk space is required

Well of course this depends on the precise resolutions chosen. I have no exact figures on time, because several runs bombed out and needed re-starting. Essentially, levels 0 to 5 take only seconds to build. Levels 6 to 8 take minutes (in 27700 level 9 took around 20 minutes, with level 10 taking a few hours). Level 13 in 900913 took less than 5 hours, while level 14 took something like 20 hours. Levels 16 (900913) and 12 (27700) are looking like they will take days. The 27700 cache rate was higher than the 900913, presumably because MapServer was not having to reproject the maps.

Some directory sizes (along with scale and resolution) for my 900913 and 27700 caches:

 

 
Level Scale
(approx)
Resolution Size on disk Files
Spherical Mercator (EPSG:900913) Early layers omitted
05 1:7M 2445.9849046875 1.34 MB 300
06 1:3M 1222.99245234375 1.33 MB 175
07 1:2M 611.496226171875 4.19 MB 400
08 1:867K 305.7481130859375 14.3 MB 975
09 1:433K 152.87405654296876 47.7 MB 2,700
10 1:217K 76.43702827148438 151 MB 9,600
11 1:108K 38.21851413574219 439 MB 33,725
12 1:54K 19.109257067871095 1.54 GB 130,700
13 1:27K 9.554628533935547 2.88 GB 510,300
14 1:14K 4.777314266967774 10.3 GB 2,017,100
15 1:6771 2.388657133483887 53.9 GB 8,027,650
16 1:3386 1.1943285667419434 211 GB 32,051,575
    TOTAL 280 GB 42,785,884
GB National Grid (EPSG:27700)
00 1:9M 3000 61,136 2
01 1:6M 2000 129,385 6
02 1:3M 1000 264,569 15
03 1:1M 500 1,220,288 150
04 1:709K 250 6,144,271 600
05 1:425K 150 16,132,572 1,125
06 1:283K 100 43,106,957 2,100
07 1:142K 50 67,469,260 7,475
08 1:71K 25 115,948,919 28,125
09 1:35K 12.5 494,478,612 106,800
10 1:14K 5 1,958,013,298 649,000
11 1:7087 2.5 6,023,812,744 2,244,000
12 1:2835 1

23,706,682,416

58.9 GB on disk

13,919,200

21,924 folders

    TOTAL

32,433,464,427

75.1 GB on disk

16,958,598

31,400 folders

 

Some time later...

I've stopped caching level 16 in 900913 as the lowest priority, and I'm attacking level 15 as well as level 12 in 27700 with multiple processes (with the min/max Y of the BBOX set to have each process caching a slice of the country). With 7 processes running, our server is working like this:

Server taking everything in its stride

Which is just fine, leaving plenty of ooomph free for running web sites etc. The pressure point is probably on physical RAM.

The server spec is:

CPU 2 x E5520 Xeon (quad core) + hyper-threading, meaning in effect 16 processors

Memory: 8 GB

OS: Windows Server 2008 x64 SP2

Disks: 1.2TB as 2 x SAS drives in RAID 5 I think.

Accessing the cache in OpenLayers

There are two ways of accessing the cache. The first is as a WMS with tilecache.py as the address. In this case, python will check whether the image tiles exists in the cache, and build any missing ones from the original data source. As time goes on, and more tiles are cached, the faster it gets. However, in 900913 I was getting frequent tile failures, and I'm not sure this method gives any real benefit over direct calls to MapServer.

The second is as a "tilecache", where the client (OpenLayers) requests the tiles individually as images, with no checks as to whether or not they exist. This is faster, as it avoids the IIS/python overhead, but obviously requires a pre-built and complete cache.

I therefore included in my OpenLayers map a layer that pointed at the resulting cache as a tile service, to check the results:

            OSOpenSphMercCached = new OpenLayers.Layer.TileCache( "OSOpenSphMercCached",
                    ["http://mywebsite/scripts/tilecache/tilecache-2.11/Cache"], "OSOpenSphMerc", {'format': 'image/png'},
                    {isBaseLayer: true}
            );

On panning around the map, this causes tile requests like this:

http://mywebsite/scripts/tilecache/tilecache-2.11/Cache/OSOpenSphMerc/15/000/032/032/000/043/439.png

This will only work if the map is configured with resolutions exactly matching those in the table above for either 27700 or 900913. But it is not necessary to use every resolution; for example we may have a map where the user cannot zoom out beyond level 4 (in 27700).

Demonstration pages:

http://www.esdmwms.no-ip.co.uk/scripts/tilecache/tilecache-2.11/indexOS27700.html

http://www.esdmwms.no-ip.co.uk/scripts/tilecache/tilecache-2.11/indexOS.html

(both have a direct WMS as default base layer, with the cached layer as another option in the layer switcher).

Next we should put a handler page in front of these services to restrict access to specified domains.

Caching overlays

Having cracked the OS Open Data, I thought it worth trying an overlay, so I chose the Norfolk HER archaeological monuments layers from HBSMR. These are quite slow to draw as WMS from our web4 server, with about 60,000 point, line and polygon features stored in MapInfo tables in WGS84.

I used the same resolutions etc as the OS 27700 data. Caching levels 0 to 11 took a few minutes; level 12 is taking perhaps an hour or two. The cache is about 1GB.

The one small gotcha is how to use a tilecache layer as an overlay in OpenLayers - on my first attempt it refused to budge from the base layers collection.

Working syntax was thus:

            NHECached = new OpenLayers.Layer.TileCache( "NHECached",
                    "http://www.esdmwms.no-ip.co.uk/scripts/tilecache/tilecache-2.11/Cache", "NHE",
                    {'format': 'image/png', reproject: false, isBaseLayer: false}
            );

The results are fantastic. Of course a cache is broken if the data changes, but given the speed of creating this cache, it would be perfectly realistic to have this as a scheduled job to rebuild a cache automatically every night, or whatever is appropriate depending on how the source data is managed.

Preventing pink tiles

Where a dataset does not cover the entire map extent, Open Layers will still request tiles from a tilecache layer, giving the pink tiles where no image is returned. To prevent this, add this JavaScript function:

        OpenLayers.Util.onImageLoadError = function() {
            this.src = "../tilecache-2.11/blank256.png";
            this.style.display = "";
        };

And make sure there is an appropriate 256x256 image in place.

Multiple URLs for the Cache

The bottlecks with a tilecache prepared and accessed as shown above are a) disk access and b) http throttling. The second of these is likely to be significant before the first - AFAIK IIS limits to two concurrent requests to the same domain. With any one map operation potentially requests perhaps 20 tiles, this will be a pinch point.

Fortunately OpenLayers allows a tilecache (or in fact any grid based layer) to be accessed from multiple URLs. It then requests some tiles from each defined address.

            OSOpen27700Cached = new OpenLayers.Layer.TileCache( "OSOpen27700Cached",
                    [
                    "http://www.anglingdiary.org.uk/Data/Sites/3/userfiles/gisdata",
                    "http://www.esdmwms.no-ip.co.uk/scripts/tilecache/tilecache-2.11/Cache"
                    ],
                    "OSOpen27700", {'format': 'image/png'},
                    {isBaseLayer: true}
            );

I don't know whether it is possible to kid IIS into allowing multiple addresses that in fact point to the same web server (i.p.) and file cache. I have demonstrated that OpenLayers does its bit, but it may be that we would have to clone the cache onto two virtual servers with different i.p. addresses to obtain the true benefits. This would only become necessary once the cache is under heavy load.

Accessing a subset of ​the resolutions in OpenLayers

Sometimes we want a map that has a tilecached layer, but we do not want all of the resolutions. For example we may want the Spherical Mercator OS Open Data, but start viewing at a UK scale rather than global. Unfortunately you cannot just knock off a few resolutions in the OpenLayers layer definition and expect it to work. OpenLayers equates the first resolution with folder "0", the next with "1", etc. So knocking off a resolution has it looking for the wrong image tiles. There is a neat solution: in your web site set up symbolic links to the real cache folders using mklink, mapping a folder called "0" to the real folder "5" (or whatever is appropriate) - this means you now have a new level 0 at the appropriate scale for the OpenLayers layer. This is not a complete write-up of the solution - but we have implemented this in the Lincolnshire Heritage @ Risk web site (October 2011).

Example: in a folder for the new pseudo tilecache, run a command like this for each resolution:

mklink /D 0 D:\PathToMyCache\MyCacheName\5

The same technique allows composite caches to be constructed, for example containing one specific dataset that has been cached at one scale with others for other scales.

 

Comments

satya

re: Some notes about setting up ​Tilecache for the Ordnance S​urvey Open Data

01 August 2019

In OpenLayers 3 what is the solution for 

An error occurred: Current x value 77.036133 is too far from tile corner x 77.040793
https://www.esdm.co.uk/some-notes-about-setting-up-​tilecache-for-the-ordnance-s​urvey-open-data