Of all the questions posed by Plato, the profundity of one stands head and shoulders above the rest:
To answer Plato's question we're going need some geographic information about UK postcodes:
National Statistics Postcode Lookup
This data set is probably the right one for the job. It's from a reliable source, it contains longitude and lattitude for 2.6 million postcodes and best of all - it's free.
The data is downloadable from geoportal.statistics.gov.uk, first item under the 'Postcodes' menu. The dataset appears to be released quarterly every February, May, August and November.
At the time of writing, the latest dowload link points to:
www.arcgis.com/sharing/rest/content/items/7606baba633d4bbca3f2510ab78acf61/data
Interestingly, the domain is www.arcgis.com, the website for a well known commercial Geographic Information System - ArcGIS, from Esri.
Other data sets are available
Code-Point Open
Code-Point Open from Ordnance Survey, free but location information is coded as Eastings and Northings, not ideal for this project.
PostZon
Part of the PAF datasets from Royal Mail, mentioned in the PAF Programmers Guide, longitude and lattitude, but not much information beyond that. Non-free and was apparently leaked by Wikileaks in 2009:
Was the leak of Royal Mail's PostZon database a good or bad thing?
UK Postcodes to Longitudes Latitudes Table
Provided by postcodeaddressfile.co.uk - a Royal Mail reseller. Appears to be a combination of PAF and OS data, has longitude and lattitude data but costs £199 for an Organisation Licence.
Geospatial Index
Redis provides geospatial indexing and a bunch of related commands, awesome - as long as you can provide it with longitude and lattitude data:
Ideal for answering the question "How many postcodes are within a given radius of a given postcode" is the GEORADIUSBYMEMBER command.
Data Load
This bash script downloads the February 2021 release of National Statistics Postcode Lookup ZIP file, unzips the file we need, parses the data and formats into Redis commands which are piped to Redis.
The script uses the csvtool command line utility which will need to be installed if you don't already have it.
load-nspl.sh
#!/bin/bash # Data URL from: https://geoportal.statistics.gov.uk/datasets/national-statistics-postcode-lookup-february-2021 DATA_URL='https://www.arcgis.com/sharing/rest/content/items/7606baba633d4bbca3f2510ab78acf61/data' ZIP_FILE='/tmp/nspl.zip' CSV_FILE='/tmp/nspl.csv' CSV_REGEX='NSPL.*UK\.csv' REDIS_KEY='nspl' # NSPL - National Statistics Postcode Lookup POSTCODE_FIELD=3 # PCDS - Unit postcode variable length version LAT_FIELD=34 # LAT - Decimal degrees latitude LONG_FIELD=35 # LONG - Decimal degrees longitude START_TIME="$(date -u +%s)" # Download data file if it doesn't exist if [ -f "$ZIP_FILE" ] then echo "'$ZIP_FILE' exists, skipping download" else echo "Downloading '$ZIP_FILE'" wget $DATA_URL -O $ZIP_FILE fi # Unzip data if it doesn't exist if [ -f "$CSV_FILE" ] then echo "'$ZIP_FILE' exists, skipping unzipping" else echo "Unzipping data to '$CSV_FILE'" unzip -p $ZIP_FILE $(unzip -Z1 $ZIP_FILE | grep -E $CSV_REGEX) > $CSV_FILE fi # Process data file, create Redis commands, pipe to redis-cli echo "Processing data file '$CSV_FILE'" csvtool format "GEOADD $REDIS_KEY %($LONG_FIELD) %($LAT_FIELD) \"%($POSTCODE_FIELD)\"\n" $CSV_FILE \ | redis-cli --pipe # Done END_TIME="$(date -u +%s)" ELAPSED_TIME="$(($END_TIME-$START_TIME))" MEMBERS=$(echo "zcard nspl" | redis-cli | cut -f 1) echo "$MEMBERS postcodes loaded" echo "Elapsed: $ELAPSED_TIME seconds"
Expect output from the script similar to this:
Downloading '/tmp/nspl.zip' ... 196050K ...... 100% 47.2M=54s ... Unzipping data to '/tmp/nspl.csv' Processing data file '/tmp/nspl.csv' ... ERR invalid longitude,latitude pair 0.000000,99.999999 ... All data transferred. Waiting for the last reply... Last reply received from server. errors: 23258, replies: 2656252 2632994 postcodes loaded Elapsed: 18 seconds
Don't worry about the errors:
ERR invalid longitude,latitude pair 0.000000,99.999999
There are about 23,000 entries in the data file with invalid longitude and lattitude values which Redis will reject. The NSPL User Guide (available in the downloaded ZIP file - NSPL User Guide Feb 2021.pdf) has this to say about them:
"Decimal degrees latitude - The postcode coordinates in degrees latitude to six decimal places; 99.999999 for postcodes in the Channel Islands and the Isle of Man, and for postcodes with no grid reference."
and
"Decimal degrees longitude - The postcode coordinates in degrees longitude to six decimal places; 0.000000 for postcodes in the Channel Islands and the Isle of Man, and for postcodes with no grid reference."
Queries
Once we've got a full dataset loaded we can run some queries with redis-cli:
127.0.0.1:6379> geopos nspl "YO24 1AB" 1) 1) "-1.0930296778678894" 2) "53.95831391882791195" 127.0.0.1:6379> geopos nspl "YO1 7HH" 1) 1) "-1.0816839337348938" 2) "53.96135558421912037" 127.0.0.1:6379> geodist nspl "YO24 1AB" "YO1 7HH" km "0.8159" 127.0.0.1:6379> georadiusbymember nspl "YO24 1AB" 100 m WITHDIST 1) 1) "YO24 1AY" 2) "29.0576" 2) 1) "YO1 6HT" 2) "2.0045" 3) 1) "YO2 2AY" 2) "2.0045" 4) 1) "YO24 1AB" 2) "0.0000" 5) 1) "YO24 1AA" 2) "69.7119" 127.0.0.1:6379> georadiusbymember nspl "YO1 7HH" 50 m WITHDIST 1) 1) "YO1 2HT" 2) "32.6545" 2) 1) "YO1 7HT" 2) "32.6545" 3) 1) "YO1 7HH" 2) "0.0000" 4) 1) "YO1 2HZ" 2) "40.3405" 5) 1) "YO1 2HL" 2) "37.6516" 6) 1) "YO1 7HL" 2) "38.9421"
REST API
Here's a super basic Flask based REST service to query the geographic index. Postcode, distance and units can be provided as search parameters in the request URL. Postcodes within the requested radius are returned as JSON, along with their distance from the provided postcode.
nspl-rest.py
from flask import Flask, jsonify from redis import Redis REDIS_HOST = 'localhost' REDIS_PORT = 6379 REDIS_DB = 0 REDIS_KEY = 'nspl' app = Flask(__name__) r = Redis(host=REDIS_HOST, port=REDIS_PORT, db=REDIS_DB) @app.route('/radius/<postcode>/<distance>/<unit>', methods=['GET']) def radius(postcode, distance, unit): try: results = r.georadiusbymember(REDIS_KEY, postcode, distance, unit, withdist=True) except Exception as e: results = {} return jsonify([{ 'postcode': result[0], 'distance':result[1] } for result in results]) app.run()
API Example Usage
$ curl localhost:5000/radius/YO24%201AB/100/m | json_pp [ { "distance" : 29.0576, "postcode" : "YO24 1AY" }, { "distance" : 2.0045, "postcode" : "YO1 6HT" }, { "distance" : 2.0045, "postcode" : "YO2 2AY" }, { "distance" : 0, "postcode" : "YO24 1AB" }, { "distance" : 69.7119, "postcode" : "YO24 1AA" } ]
Source Code
- Code available in GitHub - nspl-radis-search