Skip navigation

Methodologies for the analysis of spatial data

The quantitative analysis of spatial data poses challenges that cannot always be met by drawing on classical statistical theory. This theory assumes that data values are statistically independent whereas much spatial data does not have this property - having features more in common with time series data.

A history of the evolution of quantitative methods in geography would identify the period up to the early 1970s when geographers focused on tests to detect the presence of what is called spatial autocorrelation. This was followed in the late 1970s and 1980s by a period in which geographers sought to develop models that would describe spatial variation. This work became, almost seamlessly, part of another research agenda which was to develop that workhorse of statistics - the normal linear regression model - so that it could be used to test hypotheses using spatial data or so that it made allowance for the spatial relationships inherent within the underlying mechanisms.

Spatial Data Analysis: Theory and Practice Since the late 1980s there has been a broadening of the field of spatial data analysis and a gathering together of many methodological threads within what has come to be called Geographic Information Science (GISc). A review of these "converging perspectives" appeared in a paper with Michael Goodchild in Papers in Regional Science. My 2003 book Spatial Data Analysis: Theory and Practice brings together GISc and spatial statistics as a way of understanding the modern field of spatial data analysis and also includes preliminary examples of the work currently being done in collaboration with Dr Jane Law (now University of Waterloo, Canada).

Currently we are experimenting with "spatial" generalized linear models (GLMs) within a Bayesian framework. We focus on area data that are in the form of presence/absence (0/1) or counts (0,1,2,3,...). In this work a "spatial" GLM means a model that includes spatially structured random effects in order to capture spatially structured unexplained variation. This is analogous to using a spatial error model in normal linear regression. The importance of this work lies in the fact that although geographers have long been able to apply regression theory to a continuous valued normal response variable recorded across a set of areas, findings from this work will enable them to undertake the same kind of modelling and hypothesis testing where the response variable is discrete valued (e.g. where data follow a Bernouilli, Binomial, Poisson or other discrete valued probability model). These frequently arise in the analysis of health data and crime data - to name but two examples! The diagram below shows the results of decomposing a map of the odds that small areas in Sheffield are high crime areas into constituent elements. Of course it is the covariates that we are usually most interested in. However to make rigorous inference it is necessary to ensure that model assumptions (particularly those that might be violated as a consequence of the spatial nature of the data) are satisfied.

Diagram as described adjacent

Figure. Map decomposition of the odds of an enumeration district being a high serious crime area (HIA) into the components associated with the covariate (X1 = index of ethnic heterogeneity) and unstructured (U) and spatially structured (S) random effects. Study area: West Sheffield, England.

Work in the pipeline includes on-going collaborative work with Dan Griffith (University of Texas at Dallas) courtesy of a Leverhulme grant. This is comparing different ways of handling the spatial dependence problem in the case of Poisson models with over-dispersion (and where the over-dispersion is believed to be the result, at least in part, of spatial dependence effects).

Publications

Recent publications include: