#### Spatial distributions

##### Introduction

See appendices for further tecnhical details about the various spatial distributions implemented in GeoBUGS 1.2.
##### car.normal and car.l1

The intrinsic Gaussian CAR prior distribution is specified using the distribution `car.normal` for the vector of random varables S = (S1, ....., SN). A robust version of this model is available, which assumes a double exponential (Laplace) rather than Gaussian distribution: this is called `car.l1`. The syntax for these distributions is as follows:

`S[1:N] ~ car.normal(adj[], weights[], num[], tau)S[1:N] ~ car.l1(adj[], weights[], num[], tau)`
where:

`adj[]` - A vector listing the ID numbers of the adjacent areas for each area (this is a sparse representation of the full adjacency matrix for the study region, and can be generated using the Adjacency Tool from the menu in GeoBUGS.
`weights[]` - A vector the same length as `adj[]` giving unnormalised weights associated with each pair of areas. For the CAR model described above, taking Cij = 1 (equivalently Wij = 1/ ni ) if areas i and j are neighbours and 0 otherwise, gives a vector of 1's for `weights[]`.
`num[]` - A vector of length N (the total number of areas) giving the number of neighbours ni for each area.
`tau` - A scalar argument representing the precision (inverse variance) parameter of the Gaussian CAR prior, or the inverse scale parameter of the Laplace prior (for the latter model, the variance = 2 / tau2).

The first 3 arguments must be entered as data (it is currently not possible to allow `weights[]` to be unknown); the final variable `tau` is usually treated as unknown and so is assigned a prior distribution. The data variables num and adj may be created by the adj matrix option of the Adjacency Tool as described in Adjacency matrices. The variable `weights` must be created by the user, and must be a vector the same length as `adj`. A common choice is to set all the `weights` equal to 1 since this gives the standard Besag, York and Mollie (1991) CAR model (see section on intrinsic CAR models in Appendix 1 for further discussion of weights). The easiest way to do this is to create a loop in your BUGS model code:

`for (j in 1:sumNumNneigh){ weights[j] <- 1}`

where `sumNumNneigh` is the length of `adj` and is also output by the option of the Adjacency Tool.

Important things to check when using the `car.normal` or `car.l1` distributions:
* The `car.normal` and `car.l1` distributions use unnormalised weights (see section on intrinsic CAR models in Appendix 1).
* An area cannot be specified as its own neighbour so make sure the ID number of the area itself does not appear in as one of its list of neighbours in the `adj` vector. BUGS does not check for this, so it is the user's responsibility.
* The weights must be symmetric ( Wij = Wji ). GeoBUGS does carry out a check for this and returns an error message if non-symmetric weights are detected.
* Take care with priors on tau, and be prepared to carry out sensitivity analysis to this choice.
* The car.normal and car.l1 distributions are parameterised to include a sum-to-zero constraint on the random effects. This means that you must also include a separate intercept term in your model, which MUST be assigned an improper uniform prior using the `dflat()` distribution.
* Because the car.normal and car.l1 distributions are improper, they can only be used as prior distributions, and not as a likelihood.

##### car.proper

The proper Gaussian CAR prior distribution is specified using the distribution car.proper for the vector of random variables S = (S1, ..., SN). The syntax for this distributions is as follows (NOTE the order of the arguments for car.proper has changed from OpenBUGS/WinBUGS):

`S[1:N] ~ car.proper(adj[], C[], num[], mu[], M[], tau, gamma)`

where:

`adj[]` - A vector listing the ID numbers of the adjacent areas for each area (this is a sparse representation of the full adjacency matrix for the study region, and can be generated using the Adjacency Tool from the menu in GeoBUGS.
`C[]` - A vector the same length as adj[] giving normalised weights associated with each pair of areas (see sections on conditional specification and proper CAR priors in Appendix 1). Note that this differs from the intrinsic `car.normal` or `car.l1` distributions, where unnormalised weights should be specified.
`num[]` - A vector of length N (the total number of areas) giving the number of neighbours ni for each area.
`mu[]` - A vector giving the mean for each area (this can either be entered as data, assigned a prior distribution, or specified deterministically within the model code).
`M[]` - A vector of length N giving the diagonal elements Mii of the conditional variance matrix (see sections on conditional specification and proper CAR priors in Appendix 1)
`tau` - A scalar parameter representing the overall precision (inverse variance) parameter.
`gamma` - A scalar parameter representing the overall degree of spatial dependence. This parameter is constrained to lie between bounds given by the inverse of the minimum and maximum eigenvalues of the matrix M-1/2 C M1/2 (see appendix 1). BUGS contains two deterministic functions for calculating these bounds (or they can be calculated externally to BUGS and input by the user):

`min.bound(C[], adj[], num[], M[])max.bound(C[], adj[], num[], M[])`

where the arguments are the same as the corresponding vectors used as arguments to the `car.proper` distribution.

Important things to check when using the `car.proper` distribution:

* `C`, `adj`, `num` and `M` must be entered as data (it is currently not possible to allow C to be unknown); `num` and `adj` may be created by the option of the Adjacency Tool as described in Adjacency matrices. The Lips example shows a (slightly clumsy) way of creating the `C` and `M` vectors within the BUGS model code; alternatively, these can be created externally to BUGS and read in as data.
* The car.proper distribution uses normalised weights C (see section on proper CAR priors in Appendix 1).
* An area cannot be specified as its own neighbour so make sure the ID number of the area itself does not appear in as one of its list of neighbours in the `adj` vector. GeoBUGS does not check for this, so it is the user's responsibility.
* The symmetry constraint Cij Mjj = Cji Mii must be satisfied. GeoBUGS does carry out a check for this and returns an error message if lack of symmetry is detected.
* Take care with priors on tau, and be prepared to carry out sensitivity analysis to this choice.

* Take care with priors on gamma: you must ensure that the prior is constrained between the appropriate bounds. Besag, York and Mollie (1991) suggest that gamma needs to be close to its upper bound before it reflects reasonable spatial dependence, so you may want to try informative priors to represent this, and be prepared to carry out sensitivity analysis.
##### spatial.exp and spatial.disc

Bayesian Gaussian kriging models (multivariate Gaussian distribution with covariance matrix expressed as a parametric function of distance between pairs of points - e.g. see Diggle, Tawn and Moyeed, 1998 and Appendix 1) can be specified using the distributions `spatial.exp` or `spatial.disc` for the vector of random variables S = (S1, ..., SN). The syntax for these distributions is as follows:

`S[1:N] ~ spatial.exp(mu[], x[], y[], tau, phi, kappa)S[1:N] ~ spatial.disc(mu[], x[], y[], tau, alpha)`
where:

`mu[]` - A vector giving the mean for each area (this can either be entered as data, assigned a prior distribution, or specified deterministically within the model code).
`x[]` and `y[]` - Vectors of length N giving the x and y coordinates of the location of each point, or the centroid of each area
`tau` - A scalar parameter representing the overall precision (inverse variance) parameter.

Two options are available for specifying the form of the covariance matrix: the powered exponential function and the '`disc`' function (see section on Joint Specification in Appendix 1).

The powered exponential function is implemented using the `spatial.exp` distribution and has 2 parameters:
`phi` - A scalar parameter representing the rate of decline of correlation with distance between points. Note that the magnitude of this parameter will depend on the units in which the x and y coordinates of each location are measured (e.g. metres, km etc.).
`kappa` - A scalar parameter controlling the amount of spatial smoothing. This is constrained to lie in the interval [0, 2).

The `disc` function is implemented using the `spatial.disc` distribution and has 1 parameter:
`alpha` - A scalar parameter representing the radius of the 'disc' centred at each (x, y) location. Note that the magnitude of this parameter will depend on the units in which the x and y coordinates of each location are measured (e.g. metres, km etc).

Warning: These models can be very slow for even moderate sized datasets (the algorithm is of order N3)! Experience to date also suggests that it may be better to hierarchically centre this model. For example, consider the following two alternative parameterisations of the same model:

Non-hierarchically centred
`      for (i in 1:N){         y[i] ~ dnorm(S[i], gamma)          mu[i] <- alpha+beta*z[i]      }       S[1:N] ~ spatial.exp(mu[], x[], y[], tau, phi, 1)`
Hierarchically centred
`      for (i in 1:N){         y[i] ~ dnorm(S[i], gamma)          S[i] <- alpha+beta*z[i] + W[i]         mu[i]<-0      }      W[1:N] ~ spatial.exp(mu[], x[], y[], tau, phi, 1)`
In some simulated examples, the non-hierarchically centred parameterisation has produced incorrect results, while the hierarchically centred parameterisation gives sensible answers. This may be a feature of the single-site updating schemes used in BUGS, so interpret your results with care! (Thanks to Alan Gelfand, Shanshan Wu and Alex Schmidt for noting this problem).

Experience also suggests that there is often very little information in the data about the values of the parameters of the powered exponential (i.e. phi and kappa) or disc (i.e. alpha) functions. We therefore recommend that reasonably informative priors are used, or that the values are fixed a priori, based on either substantive knowledge or exploratory analysis using e.g. variograms.

##### spatial.pred and spatial.unipred

Spatial interpolation or prediction at arbitrary locations can be carried out using the `spatial.pred` or `spatial.unipred` functions, in conjunction with fitting either the `spatial.exp` or `spatial.disc` model to a set of observed data. `spatial.pred` carries out joint or simultaneous prediction at a set of target locations, whereas `spatial.unipred` carries out single site prediction. The difference is that the single site prediction yields marginal prediction intervals (i.e. ignoring correlation between prediction locations) whereas joint prediction yields simultaneous prediction intervals for the set of target locations (which will tend to be narrower than the marginal prediction intervals). The predicted means should be the same under joint or single site prediction. The disadvantage of joint prediction is that it is very slow (the computational time is of order P3, where P is the number of prediction sites). The syntax for these predictive distributions is:

Joint prediction:
`         T[1:P] ~ spatial.pred(mu.T[], x.T[], y.T[], S[])`

Single site prediction:
`         for(j in 1:P) {            T[j] ~ spatial.unipred(mu.T[j], x.T[j], y.T[j], S[])         }`
where:

`P` - Scalar giving the number of prediction locations
`mu.T[]` - vector of length `P` (or scalar for single site version) specifying the mean for each prediction location (this should be specified in the same way as the mean for the observed data S).
`x.T[]` and `y.T[]` - Vectors of length `P` (or scalars for single site version) giving the x and y coordinates of the location of each prediction point
`S` - The vector of observations to which either the `spatial.exp` or `spatial.disc` model has been fitted.

##### pois.conv

A conjugate Poisson-gamma spatial moving average distribution can be specified for non-negative counts defined on a spatial lattice (i.e. discrete geographical partition), using the distribution `pois.conv`. This is a discrete version of the Poisson-gamma random field model of Wolpert and Ickstadt (1998) and Best et al (2000a). The basic syntax for this distribution is as follows:

`         S ~ dpois.conv(mu[])`

where `S` is a non-negative scalar parameter defined at some (usually spatial) location, and `mu[]` is a vector of length J representing the influence of a set of gamma distributed latent parameters at each of J different locations on the value of S. Hence `mu[]` must be defined as a convolution of gamma random variables:

`         for (j in 1 : J) {             mu[j] <- gamma[j] * k[j]            gamma[j] ~ dgamma(a, b)         }`
where `k[j]` is a spatial kernel or spatial 'weight' depending on some measure of distance or spatial proximity between the jth latent location and the location of S, and `a` and `b` are hyperparameters to be specified (see Appendix 2 for further discussion of this distribution, including suitable kernel functions). Usually, `k[]` is calculated externally and read into WinBUGS as data; alternatively, if `k[]` depends on unknown parameters, it may be defined as part of the BUGS code and re-computed at each MCMC iteration. However, this is likely to slow down the sampling within BUGS by many orders of magnitude, so is not recommended for models with large numbers of latent parameters (i.e. J large).

More typically, the distribution will be used for each element of a vector of counts defined on a spatial lattice of N regions, using the following syntax:

`         for (i in 1:N) {            S[i] ~ dpois.conv(mu[i, ])               for (j in 1 : J) {                mu[i, j] <- gamma[j] * k[i, j]            }         }`
where the latent `gamma[j]` parameters are defined as above, and `k[,]` is now an N x J matrix with elements `k[i,j]` representing the 'weight' or contribution of the latent gamma random variable at location j to the expected value of S at location i.

Conditional on mu, the `S[i]` have independent Poisson distributions with mean = Σj mu[i, j].

Note that the model may be extended to include observed covariates as well as latent variables in the Poisson mean - see MVCAR Example.

##### mv.car

NOTE - This is not currently implemented in MultiBUGS

The multivariate intrinsic Gaussian CAR prior distribution is specified using the distribution `mv.car` for the p x N matrix of random varables `S`, where columns of `S` represent the spatial units (areas) and rows represent the variables (it is important to ensure the dimensions of `S` are specified the correct way round). The syntax for this distribution is as follows:

`         S[1:p, 1:N] ~ mv.car(adj[], weights[], num[], omega[ , ])`

where:

`adj[]` - A vector listing the ID numbers of the adjacent areas for each area (this is a sparse representation of the full adjacency matrix for the study region, and can be generated using the Adjacency Tool from the menu in GeoBUGS.
`weights[]` - A vector the same length as `adj[]` giving unnormalised weights associated with each pair of areas. For the CAR model described above, taking Cij = 1 (equivalently Wij = 1/ ni) if areas i and j are neighbours and 0 otherwise, gives a vector of 1's for `weights[]`.
`num[]` - A vector of length N (the total number of areas) giving the number of neighbours ni for each area.
`omega[,]` - A p x p matrix representing the precision (inverse variance) matrix of the multivariate intrinsic Gaussian CAR prior.

The first 3 arguments must be entered as data (it is currently not possible to allow the `weights` to be unknown); the final variable `omega` is usually treated as unknown and so is assigned a prior distribution (which must be a Wishart distribution). The data variables `num` and `adj` may be created by the adj matrix option of the Adjacency Tool as described in Adjacency matrices. The variable `weights` must be created by the user, and must be a vector the same length as `adj`. A common choice is to set all the `weights` equal to 1 since this gives the multivariate equivalent of the standard Besag, York and Mollie (1991) CAR model (see sections on intrinsic CAR models and multivariate intrinsic CAR models in Appendix 1 for further discussion of weights). The easiest way to do this is to create a loop in your BUGS model code:

`         for(j in 1:sumNumNneigh) { weights[j] <- 1}`

where `sumNumNneigh` is the length of `adj` and is also output by the `adj` matrix option of the Adjacency Tool.

Important things to check when using the mv.car distribution:

* The `mv.car` distribution uses unnormalised weights, as for the `car.normal` distribtion.
* An area cannot be specified as its own neighbour so make sure the ID number of the area itself does not appear in as one of its list of neighbours in the `adj` vector. GeoBUGS does not check for this, so it is the user's responsibility.
* The weights must be symmetric (Wij = Wji). GeoBUGS does carry out a check for this and returns an error message if non-symmetric weights are detected.
* Take care with priors on `omega`, and be prepared to carry out sensitivity analysis to this choice.
* The `mv.car` distribution is parameterised to include a sum-to-zero constraint on the random effects. This means that you must also include separate intercept terms in your model for each of the p variables, which MUST be assigned improper uniform priors using the `dflat() `distribution. Note: the mechanism for implementing the sum-to-zero constraint for mv.car in WinBUGS has been tested and appears to be robust; however, we recommend that you nevertheless include a check in your model code to ensure that the constraint has been implemented correctly (e.g. by including a line in your code to calculate the mean across areas of each variable for which the mv.car prior has been assumed). Please report any problems to n.best@imperial.ac.uk.
* An alternative unconstrained version of the multivariate CAR prior is available in WinBUGS 1.4, called `mv.car.uncon`. The syntax is the same as for `mv.car`.
* Because the `mv.car` (and `mv.car.uncon`) distribution is improper, it can only be used as a prior distribution, and not as a likelihood.

##### Appendix 1: Structured MV Gaussian and CAR

(Note: for clarity of exposition, we parameterise the Normal distribution in terms of the mean and variance in the following discussion. However, WinBUGS parameterisation of the CAR and structured MVN models is in terms of the mean and precision as usual.)

Assume we have a set of area-specific spatially correlated Gaussian data or random effects Si, i=1,..., N (where N is the number of areas in the study region). Suppose their joint distribution may be expressed as follows:

S ~ MVN(μ, vΣ)

where S = ( S1, ....., SN ), MVN denotes the N-dimensional multivariate normal distribution, μ is the 1 x N mean vector, v > 0 controls the overall variability of the Si and Σ is an N x N positive definite matrix.

`Joint specification`
We may assume a parametric form for the elements of the between-area correlation matrix:
Σij    =   f(dij; θ)
where dij = distance between areas i and j. WinBUGS 1.4 allows two options for the funtion f(.) (see Richardson (1992) for further details of these functions):

1. Powered exponential family

f(dij; φ, κ) = exp[-(φ dij)κ ] where φ > 0, κ in (0, 2]

The parameter φ controls the rate of decline of correlation with distance:

φ large -> rapid decay
φ small -> slow decay

One possible strategy for specifying a prior for φ is to choose a uniform distribution between φ min and φ max where φ min and φ max are chosen to give a sensible range of values for correlations both at a distance equal to the maximum distance between any pair of areas in the study region, and at a distance equal to the minimum distance between any pair of areas in the study. For example, if the minimum distance is 1 km (say), and the maximum distance is 20 km (say), then values of φ min = 0.04 and φ max = 5 would give a diffuse but plausible prior range of correlations (assuming κ = 1) between 0.007 and 0.96 at a distance of 1 km, and between 0 and 0.45 at a distance of 20 km. Note that it is advisable to choose a value of φ min that prevents the correlation at the maximum distance between observations in the study region from being too high, since this can lead to identifiability problems between the overall mean, μ, of the spatial random variables and the correlation parameter, φ.

The parameter κ controls the amount by which spatial variations in the data are smoothed. Large values of κ lead to greater smoothing, with κ = 2 corresponding to the Gaussian correlation function (although the resulting covariance matrix is nearly singular). It is often difficult to learn much about this parameter, so unless there is a good reason for believing otherwise, it is usually advisable to set κ = 1 a priori.

2. Disc model

f(dij; α) = 2 / π * {cos-1(dij /α) - [(dij /α)(1 - (dij2 /α2))]1/2 }   for dij < α
= 0   for dij > α

The parameter α controls the rate of decline of correlation with distance:

α large -> slow decay
α small -> rapid decay

The disc function leads to an approximately linear decrease in correlation with increasing distance, with correlation declining to zero at a distance equal to α. Again, it is advisable to choose a prior for α that prevents the correlation at the maximum distance between observations in the study region from being too high, since this can lead to identifiability problems between the overall mean, μ, of the spatial random variables and the correlation parameter, α. An upper prior bound on α equal to a small multiple of the maximum distance in the study region is therefore a sensible 'default' choice. The minimum value allowed a priori for α should also be chosen carefully since values of α less than the minimum distance between observations in the study region will not be identifiable.

Exploratory analysis using variograms maybe be helpful to help chose an appropriate specification for the correlation function and associated parameters.

`Conditional specification`
By writing the between-area covariance matrix in the following form:
vΣ = v(I - γ C)-1 M

where I = N x N identity matrix
M = N x N diagonal matrix, with elements Mii proportional to the conditional variance of Si | Sj
C = N x N weight matrix, with elements Cij reflecting spatial association between areas i and j
γ = controls overall strength of spatial dependence

and using standard multivariate normal theory (e.g. Johnson and Kotz, 1972; Besag and Kooperberg, 1995), the joint multivariate Gaussian model can be expressed in the form of a set of conditional distributions

Si | S-i ~ Normal (μ i + Σj γ Cij (sj i), φ Mii ),

(here Σj denotes summation over j = 1 to N, not covariance matrix, and S-i denotes all the elements of S except Si)

From a modelling perspective, use of the joint formulation requires specification of the elements of the covariance matrix Σ (see above), while use of the conditional formulation reduces to specification of the matrices C and M and the spatial dependence parameter γ. Various constraints are needed on the values of C, M and γ in order to ensure that Σ is symmetric positive definite:

* Σ is only symmetric if Cij Mjj = Cji Mii
* Var( Si | Sj ) = vMii > 0 so Mii must be > 0* To ensure S is positive definite, γ must lie between γ min and γ max where γ min-1 and γ max-1 are the smallest and largest eigenvalues of M-1/2 C M1/2   * In practice, we often expect positive spatial dependence, so constrain prior for γ to be between 0 and γ max
* γ = 0 implies no spatial dependence

`Intrinsic CAR model`
Besag, York and Mollie (1991) propose an intrinsic version of this CAR model in which the covariance matrix Σ is not positive definite. Their model corresponds to choosing Cij = 1/ni if areas i and j are adjacent and Cij = 0 otherwise (with Cii also set to 0), Mii = 1/ni, and setting γ = γ max which turns out to always be 1 with this particular choice of Cij and Mii. Here, ni is the number of areas which are adjacent to area i. Comparison with the equations above shows that this leads to the following model for the conditional distribution of Si | S-i :

Si | S-i ~ Normal ( S.bari , v / ni )

where S.bari = Σj in δi Sj / ni and δi denotes the set of labels of the "neighbours" of area i. Hence Si has a normal distribution with conditional mean given by the average of the neighbouring Sj 's and conditional variance inversely proportional to the number of neighbours ni . Note that an equivalent specification is take unnormalised weights Wij = 1 if areas i and j are adjacent and Wij = 0 otherwise, and set Cij = Wij / Wi+ where Wi+ = Σj Wij. The car.normal and (car.l1) distributions in GeoBUGS requires the user to specify unnormalised weights.

Since the CAR model defined above is improper (the overall mean of the Si is not defined), it can only be used as a prior distribution for spatially distributed random effects, and not as a likelihood for data. It is often convenient to assume that suchrandom effects have zero mean. Besag and Kooperberg (1995) show that constraining the random effects to sum to zero and specifying a separate intercept term with a location invariant Uniform(-infty, +infty) prior is equivalent to the unconstrained parameterisation with no separate intercept. WinBUGS 1.4 includes a distribution called dflat() which corresponds to an improper (flat) prior on the whole real line. This prior must always be used for the intercept term in a model including CAR random effects.

We must also specify prior distributions for the overall variance parameter v. As usual in WinBUGS, the car.normal (and car.l1) distributions are parameterised in terms of the precision τ = 1/v . Care is needed when choosing a prior for τ since the posterior variance of the random effects can be quite sensitive. One option is a gamma distribution with shape and inverse scale parameters both equal to 0.01. This has a mean of 0.01/0.01 = 1 and a large variance of 0.01/(0.01)2 = 100; however, this tends to place most of the prior mass away from zero (on the scale of the random effects standard deviation), and so in situations when the true spatial dependence between areas is negligible (i.e. standard deviation close to zero) this may induce artefactual spatial structure in the posterior. Kelsall and Wakefield (1999) suggest an alternative gamma(0.5, 0.0005) prior for the precision parameter of the spatial random effects in a CAR model. This expresses the prior belief that the random effects standard deviation is centered around 0.05 with a 1% prior probability of being smaller than 0.01 or larger than 2.5.

`Proper CAR model`
If γ is constrained to lie in the interval (γ min , γ max ) where γ min-1 and γ max-1 are the smallest and largest eigenvalues of M-1/2 C M1/2 as defined above, rather than being fixed to its maximum as in the intrinsic CAR, then the resulting distribution is proper provided the constraints on C and M are still satisfied. The choice Cij = 1/ni if areas i and j are adjacent and Cij = 0 otherwise (with Cii also set to 0) and Mii = 1/ni, where ni is the number of areas which are adjacent to area i is still valid for the proper CAR. In the context of disease mapping Cressie and Chan (1989) and Stern and Cressie (1999) choose an alternative parameterisation:

Mii = 1/Ei (the inverse of the expected count or population size in area i)
Cij = (Ej / Ei)1/2 for neighbouring areas i, j and 0 otherwise

`Multivariate instrinsic CAR model`
Assume we have a multivariate p-dimensional vector of spatially correlated Gaussian data or random effects in each area, Si = (S1i , S2i,....., Spi )', i=1,..., N. The CAR models discussed above extend naturally to a multivariate setting by replacing the univariate Gaussian conditional distribution for Si | Sjwith a multivariate conditional distribution. In the case of the intrinsic CAR prior with 0-1 adjacency weights as used by Besag, York and Mollie (1991), and taking the case of p = 2 for simplicity, this gives:
Si | (S1(-i), S2(-i)) ~ Bivariate Normal (S.bari , V / ni ),
(here (S1(-i), S2(-i)) denotes the elements of the 2 x N matrix S excluding the ith area (column))

where S.bari = (S.bari1 ,S.bari2) with S.barip = Σ j in δi Sjp / ni and, as in the univariate case, δi and ni denote the set of labels of the "neighbours" of area i and the number of neighbours, respectively. V is a 2 x 2 covariance matix with diagonal elements v11 and v22 representing the conditional variances of S1 and S2 respectively, and off-diagonal element v12 representing the (conditional) within-area covariance between S1 and S2. In general Si has a p-dimensional multivariate normal distribution with pth element of the conditional mean vector given by the average of the neighbouring Sjp 's and conditional covariance matrix inversely proportional to the number of neighbours ni, with pth diagonal element representing the conditional variance of the pth component of S and off-diagonal elements representing the conditional covariances between each pair of the p elements of S.

Gelfand and Vounatsou (2003) discuss a multivariate generalisation of the proper version of CAR distribution, but only the improper intrinsic multivariate CAR is currently implemented in WinBUGS.

##### Appendix 2: Poisson-gamma spatial moving-average convolution

Spatial moving average models are a flexible class of models that have been used to describe continuous spatial processes, particularly in geostatistical applications. Such models are constructed by integrating a simple two-dimensional random noise process (for example, a grid of iid Gaussian random variables) with a smoothing kernel that is a function of distance and, possibly, location. The kernel can be thought of as a device to `smear out' the random noise process in two-dimensional space to give a smooth surface.

Spatial moving average models have been developed primarily for continuous spatial processes and are currently not implementable in WinBUGS 1.4. However, Ickstadt and Wolpert (1998) and Best et al (2000b) proposed a discrete version of a gamma moving average process, for use in identity-link Poisson regression models. Suppose we have a set of area-specific spatially correlated Poisson count data (or random effects) Si, i=1,..., N (where N is the number of areas in the study region). Ickstadt and Wolpert (1998) and Best et al (2000b) model spatial dependence at the level of the Poisson mean, and assume that the counts Si are conditionally independent given this mean:

Si ~ Poisson( λi )

The model for each λi is constructed by specifying an arbitrary grid of latent iid gamma random variables γj (j=1,...,J where J is the total number of grid cells defining the latent process) covering the study region. These are then convolved with a kernel matrix whose elements, kij, represent the relative contribution of the latent variable in grid cell j to the Poisson mean in area i:

λi = Σj γ j * kij

Best et al (2000b) assume an isotropic, stationary Gaussian kernel function (although other kernel forms are easily accommodated, such as an adjacency-based kernel - see the 'Forest Example'):

kij = τ / (2π ρ2) exp(-dij2 / 2ρ2)

where τ can be thought of as a scale factor for the λ's, dij is the distance between the centroid of area i and the centroid of latent grid cell j, and ρ is the spatial range parameter governing how rapidly the influence of the latent gamma random variables on the area-specific Poisson means declines with distance.

One interpretation of Poisson-gamma moving average model is to view the gamma random variables as representing the location and magnitude of unmeasured risk factors, and the area-specific Poisson means λi as representing the cumulative effect of these risk factors in each area, weighted by their distance from the area according to the kernel 'weights' kij.

The model may be extended to include observed covariates and an offset adjustment (e.g. to account for different populations in different areas). Observed covariates are included as additive terms in the linear predictor λi as in a standard identity link Poisson regression:

λi = Σ k β k Xik + Σj γj * kij

where Xik is the value of the kth observed covariate in area i. Note that due to the identity link, all covariates and regression coefficients must take positive values, yielding an interpretation as additive excess risk factors. The latent terms θi = Σj γj * kij may then be thought of as a spatial (non-negative) random effect for area i. Adjustment for an offset - say the population living in each area, Ni - is easily achieved by setting

λ
i = pi Ni
pi = Σ k β k Xik + Σj γ j * kij

Note that Best et al (2004) discuss how to standardise the offset Ni for different age and sex distributions within each area in an epidemiological application of these models.

Estimation of the `pois.conv` distribution is carried out in WinBUGS 1.4 using a data augmentation scheme to exploit the Poisson-gamma conjugacy of the full conditionals for the γ j parameters (see Ickstadt and Wolpert (1998) and Best et al (2000b) for details). It is also a good idea to assume gamma prior distributions for any other uncertain parameters in the linear predictor λi (e.g. the β k parameters above) to exploit this conjugacy, although it is possible to assume any suitable prior distribution with positive support for these other parameters. Hyperprior specification is discussed in detail by Ickstadt and Wolpert (1998) and Best et al (2000b). Note however that the prior shape (a) and precision (b) parameters of the latent gamma variables should be chosen such that γj has prior mean proportional to the area of the jth latent grid cell. This makes the model spatially extensible in the sense that any partition of the latent gamma random variables will lead to identical probability distributions for the kernel-weighted sums Σ j γj * kij.