skip to primary navigation skip to content

Methodologies for the analysis of spatial data

Methodologies for the analysis of spatial data

The quantitative analysis of spatial data poses challenges that cannot be met by drawing on classical statistical theory. This theory assumes that data values are statistically independent whereas much spatial data does not have this property.

A history of the evolution of quantitative methods in geography would identify the period up to the early 1970s when geographers focused on tests to detect the presence of what is called spatial autocorrelation. This was followed in the late 1970s and 1980s by a period in which geographers sought to develop models that would describe spatial variation. This work became, almost seamlessly, part of another research agenda which was to develop that workhorse of statistics - the normal linear regression model - so that it could be used to test hypotheses using spatial data or so that it made allowance for the spatial relationships inherent within the underlying mechanisms.

Spatial Data Analysis: Theory and Practice Since the late 1980s there has been a broadening of the field of spatial data analysis and a gathering together of many methodological threads within what has come to be called Geographic Information Science (GISc). A review of these "converging perspectives" appeared in a paper with Michael Goodchild in Papers in Regional Science. My 2003 book Spatial Data Analysis: Theory and Practice brings together GISc and spatial statistics as a way of understanding the modern field of spatial data analysis.

Recent work has focussed on area data that are in the form of presence/absence (0/1) or counts (0,1,2,3,...). In this work a "spatial" model is one that includes spatially structured random effects in order to capture spatially structured unexplained variation. The importance of this work lies in the fact that although geographers have long been able to apply regression theory to a continuous valued normally distributed response variable recorded across a set of areas, findings from this work will enable them to undertake the same kind of modelling and hypothesis testing where the response variable is discrete valued (e.g. where data follow a Bernouilli, Binomial, Poisson or other discrete valued probability model). These frequently arise in the analysis of health data and crime data - to name just two examples. The diagram below shows the results of decomposing a map of the odds that small areas in Sheffield are high crime areas into constituent elements. Of course it is the covariates that we are usually most interested in. However to make rigorous inference it is necessary to ensure that model assumptions (particularly those that might be violated as a consequence of the spatial nature of the data) are satisfied.

Diagram as described adjacent

Figure. Map decomposition of the odds of an enumeration district being a high serious crime area (HIA) into the components associated with the covariate (X1 = index of ethnic heterogeneity) and unstructured (U) and spatially structured (S) random effects. Study area: West Sheffield, England.

Three other areas of methodological research have been the subject of investigation in the last few years: (i) the application of geostatistical theory to data sets collected for irregular areas in collaboration with Dr Ruth Kerry and Prof. Margaret Oliver (see the special issue of Geographical Analysis, 2010, Volume 42(1)); (ii) the development of methods for spatial sampling in collaboration with Prof. Jinfeng Wang of the State Key Laboratory of Resources and Environmental Information Systems in Beijing; (iii) collaboration with the BIAS II group at Imperial College London (Prof. Nicky Best, Prof. Sylvia Richardson, Dr. G.Li) on the application of Bayesian spatial modelling to problems in the geography of crime (see my "crime and disorder" page for more information). It is (iii) that is currently the focus of my research effort in this area.


Selected publications in these areas of methodological development include:

  • 'Sandwich Estimation for Multi-Unit Reporting on a Stratified Heterogeneous Surface.' Environment and Planning, A. 2013, 45(10), 2515-2534. (with J.Wang, T.Liu, L.Li and C.Jiang)
  • "Geography, spatial data analysis and geostatistics.' Geographical Analysis, 2010, 42, 7-31, (with R.Kerry and M.Oliver).
  • 'Sample surveying to estimate the mean of a heterogeneous surface: reducing the error variance through zoning.' International Journal of Geographical Information Science, 2010, 24(4), 523-43, (with J.Wang and Z.Cao).
  • 'Applying geostatistical analysis to crime data: car related thefts in the Baltic States.' Geographical Analysis, 2010, 42, 53-77. (with R.Kerry, P.Goovaerts and V.Ceccato).
  • 'Modelling small area counts in the presence of overdispersion and spatial autocorrelation.' Computational Statistics and Data Analysis, 2009, 53, 2923-37, (with D.Griffith and J.Law).
  • 'Combining police perceptions with police records of serious crime areas: a modelling approach' Journal of the Royal Statistical Society, Series A (Statistics in Society) 2007, 170, 1-16 (with J.Law).
  • 'Beyond mule kicks: the Poisson distribution in geographical analysis' Geographical Analysis, 2006, 38, 123-139 (with D.Griffith).
  • 'A Bayesian approach to modelling binary data: the case of high intensity crime areas.' Geographical Analysis, 2004, 36 (3), 197-216 (with J. Law).
  • 'GIS and spatial data analysis: converging perspectives'. Papers in Regional Science 2004, 83, 363-385 (with M.Goodchild).
  • 'Spatial Data Analysis: Theory and Practice.' Cambridge University Press. April, 2003, pp 432.
  • 'Providing spatial statistical data analysis functionality for the GIS user: the SAGE project' International Jo. Geographical Information Science 2001, 15, 239-54 (with S.Wise and J.Ma).