`dloglik`

, for a dummy observed variable:` dummy[i] <- 0`

dummy[i] ~ dloglik(logLike[i])

where

`logLike[i]`

is the contribution to the log-likelihood for the i`dloglik`

function implements the 'zero poisson' method utilized in WinBUGS.This is illustrated in the example below in which a normal likelihood is constructed and the results are compared to the standard formulation.

` model {`

for (i in 1:7) {

dummy[i] <- 0

# likelihood is exp(logLike[i])

dummy[i] ~ dloglik(logLike[i])

# log(likelihood)

logLike[i] <- -log(sigma) - 0.5 * pow((x[i] - mu) / sigma, 2)

}

mu ~ dunif(-10, 10)

sigma ~ dunif(0, 10)

}

` model {`

# check using normal distribution

for (i in 1:7) {

x[i] ~ dnorm(mu, prec)

}

prec <- 1 / (sigma * sigma)

mu ~ dunif(-10, 10)

sigma ~ dunif(0, 10)

}

Data:list(x = c(-1, -0.3, 0.1, 0.2, 0.7, 1.2, 1.7))

Initial values:list(sigma = 1, mu = 0)

Results:

sigma 1.145 0.4368 0.01743 0.6113 1.047 2.263 1001 9000

or

sigma 1.208 0.5572 0.01174 0.6299 1.083 2.486 1001 9000

`theta`

, if we want to use a prior distribution not included in the standard distributions (see the list of Distributions), then we can use the '`dloglik`

' distribution (see above) for the prior specification. A single dummy observation contributes the appropriate term to the likelihood for `theta`

; and when it is combined with a 'flat' prior for `theta`

, the correct distribution results:` theta ~ dflat()`

dummy ~ dloglik(logLike)

logLike <- log(desired prior for theta)

This is illustrated by the below example in which a normal prior is constructed and the results are compared to the standard formulation. It is important to note that this method will cause the theta variable to be sampled using the metropolis algorithm and may lead to poor convergence and a high Monte Carlo error

`model {`

for (i in 1:7) {

x[i] ~ dnorm(mu, prec)

}

dummy <- 0

dummy ~ dloglik(phi) # likelihood is exp(phi)

phi <- -0.5 * pow(mu, 2) # log(N(0, 1))

mu ~ dflat() # 'flat' prior

prec <- 1 / (sigma * sigma)

sigma ~ dunif(0, 10)

}

or

`model {`

for (i in 1:7) {

x[i] ~ dnorm(mu, prec)

}

mu ~ dnorm(0, 1) # 'known' normal prior

prec <- 1 / (sigma * sigma)

sigma ~ dunif(0, 10)

}

Data:list(x = c(-1, -0.3, 0.1, 0.2, 0.7, 1.2, 1.7))

Initial values:list(sigma = 1, mu = 0)

Results:

sigma 1.162 0.4721 0.008885 0.6178 1.05 2.359 1001 9000

or

sigma 1.166 0.4785 0.007654 0.6234 1.059 2.315 1 10000

1) DIC is intended as a generalisation of Akaike's Information Criterion (AIC). For non-hierarchical models, pD should be approximately the true number of parameters.

2) Slightly different values of Dhat (and hence pD and DIC) can be obtained depending on the parameterisation used for the prior distribution. For example, consider the precision

`tau ~ dgamma(0.001, 0.001)`

and

`log.tau ~ dunif(-10, 10); log(tau) <- log.tau`

are essentially identical but will give slightly different results for Dhat. The first prior distribution has stochastic parent

3) For sampling distributions that are log-concave in their stochastic parents, pD is guaranteed to be positive (provided the simulation has converged). However, we have obtained negative pD's in the following situations:i) with non-log-concave likelihoods (e.g. Student-t distributions) when there is substantial conflict between prior and data;

ii) when the posterior distribution for a parameter is symmetric and bimodal, and so the posterior mean is a very poor summary statistic and gives a very large deviance.4) No MC error is available on the DIC. MC error on Dbar can be obtained by monitoring deviance and is generally quite small. The primary concern is to ensure convergence of Dbar - it is therefore worthwhile checking the stability of Dbar over a long chain.

5) The minimum DIC estimates the model that will make the best short-term predictions, in the same spirit as Akaike's criterion. However, if the difference in DIC is, say, less than 5, and the models make very different inferences, then it could be misleading just to report the model with the lowest DIC.

6) DICs are comparable only over models with exactly the same observed data, but there is no need for them to be nested.

7) DIC differs from Bayes factors and BIC in both form and aims.

8) Caution is advisable in the use of DIC until more experience has been gained.

The below example illustrates how this is handled in

Suppose that for each i: x[i] ~ N(mu, 1) with probability p; and x[i] ~ N(0, 1) otherwise (i.e. with probability 1 - p). We generate 100 observations with p = 0.4 and mu = 3 as follows. We forward sample

`model {`

mu <- 3

p <- 0.4

m[1] <- 0

m[2] <- mu

for (i in 1 : 100) {

group[i] ~ dbern(p)

index[i] <- group[i] + 1

y[i] ~ dnorm(m[index[i]], 1)

}

}

We may observe the underlying mixture distribution by monitoring any one of the y[i]'s over a number of additional sampling cycles. For example, the following kernel density plot was obtained after monitoring y[1] for 1000 iterations:

To analyse the simulated data we use the following code:

`model {`

mu ~ dunif(-5, 5)

p ~ dunif(0, 1)

m[1] <- 0

m[2] <- mu

for (i in 1:100) {

group[i] ~ dbern(p)

index[i] <- group[i] + 1

y[i] ~ dnorm(m[index[i]], 1)

}

}

After 101000 iterations (with a burn-in of 1000) we have good agreement with the 'true' values:

p 0.4243 0.05985 3.718E-4 0.3098 0.4232 0.5438 1001 100000

Initial values:

list(mu = 0, p = 0.5)

Simulated data:

list(

y = c(

2.401893189187883,0.2400077667887621,-0.1556589480882761,-0.8457182007723154,0.37008097263224,

3.586009960655263,1.598955590680827,3.826518138558907,-0.9630895329522409,0.6951468806412424,

3.129328672725175,0.01025316168135796,0.4887298480200992,0.3865632519840305,-0.2697534502300845,

-1.18891944058751,4.654771935717583,0.8063807170988319,-0.9867060769784521,0.9154433557950319,

-0.5419217214549653,3.358942981432516,3.33734145389337,3.3960739633218,2.038185222382867,

5.241414085016386,3.362823353717864,-0.6013483154102028,0.441480491316843,2.96228336554288,

-2.278054802181326,1.446861613005477,1.49864667127073,2.819410923921955,3.668112206659865,

0.8253991892717565,1.117718710956935,3.976040128045549,1.261678474198661,-0.03343173803015926,

-0.4908566523519207,0.3532664087054739,-1.679362022373703,-1.555760053262808,1.213022071911081,

3.421072023150202,2.523431239569371,-1.218844344999495,-0.270208787763775,-0.1217560919494103,

2.033835091596821,0.4654798734609423,-0.7231540561359688,2.146640407714382,4.286633169106659,

1.445348149793759,0.180718235361594,1.527791426438174,-1.010060680847808,1.969758236040937,

-0.3553936244225268,1.465488166547136,3.32669874753109,1.061348836020805,2.31746192198435,

0.9564080472865206,1.877477903911581,-0.6964242592539615,5.695159167887606,2.807268123194206,

-1.69815612298043,0.1881330355284009,0.04018232765300667,2.272096174325846,0.9694913345710192,

3.979152197443702,-0.7028546956095989,1.371423010966528,1.646618045628321,0.1919331499516834,

3.587928853903195,0.1219688057614579,2.570950727333546,1.888563572434331,1.10249950266553,

4.119103539377994,3.824513529111282,1.509248826260637,1.156700011660602,-0.0343697724869282,

3.816521442901518,0.0675520018218364,-0.002197033376085265,2.994147650487649,1.856351699226103,

3.539838516568062,0.5861211569557628,4.343058450523658,0.1760647082096519,3.678908111531086))

Naturally, the standard warnings about mixture distributions apply, in that convergence may be poor and careful parameterisation may be necessary to avoid some of the components becoming empty.

`for (i in 1:K) {`

y[i] ~ model 1

}

for (i in (K + 1):N) {

y[i] ~ model 2

}

since the index for a loop cannot be a random quantity. Instead we can use the step function to set up an indicator as to which set each observation belongs to:

`for (i in 1:N) {`

# will be 1 for all i <= K, 2 otherwise

ind[i] <- 1 + step(i - K - 0.01)

y[i] ~ model ind[i]

}

This is illustrated by the problem of adding up terms in a series of unknown length.

Suppose we want to find the distribution of the sum of the first K integers, where K is a random quantity. We shall assume K has a uniform distribution on 1 to 10.

`model {`

for (i in 1:10) {

p[i] <- 1 / 10 # set up prior for K

x[i] <- i # set up array of integers

}

K ~ dcat(p[]) # sample K from its prior

for (i in 1:10) {

# determine which of the x[i]'s are to be summed

xtosum[i] <- x[i] * step(K - i + 0.01)

}

s <- sum(xtosum[])

}

a) replicate the dataset

b) set up a loop to repeat the analysis for each prior, holding results in arrays;

c) compare results using the 'compare' facility.

The example prior-sensitivity explores six different suggestions for priors on the random-effects variance in a meta-analysis.

`model {`

r <- 10

p <- 0.5

r ~ dbin(p, n)

n.cont ~ dunif(1, 100)

n <- round(n.cont)

}

n.cont 21.08 4.804 0.07932 13.31 20.6 32.0 1001 5000

Assuming a uniform prior for the number of tosses, we can be 95% sure that the coin has been tossed between 13 and 32 times. A discrete prior on the integers could also have been used in this context.

Person 1: 13.2

Person 2: 12.3 , 14.1

Person 3: 11.0, 9.7, 10.3, 9.6

There are three different ways of entering such 'ragged' data into WinBUGS:

` y[,1] y[,2] y[,3] y[,4]`

13.2 NA NA NA

12.3 14.1 NA NA

11.0 9.7 10.3 9.6

END

or

`list(y = structure(.Data = c(13.2, NA, NA, NA, 12.3, 14.1, NA, NA, 11.0, 9.7, 10.3, 9.6), .Dim = c(3, 4))`

.A model such as

`y[i, j] ~ dnorm(mu[i], 1)`

can then be fitted. This approach is inefficient unless one explicitly wishes to estimate the missing data.` y[] person[]`

13.2 1

12.3 2

14.1 2

11.0 3

9.7 3

10.3 3

9.6 3

END

or

`list(y = c(13.2, 12.3, 14.1, 11.0, 9.7, 10.3, 9.6), person = c(1, 2, 2, 3, 3, 3, 3))`

.A model such as

`y[k] ~ dnorm(mu[person[k]], 1)`

can then be fitted. This seems an efficient and clear way to handle the problem.`list(y = c(13.2, 12.3, 14.1, 11.0, 9.7, 10.3, 9.6), offset = c(1, 2, 4, 8))`

and lead to a model containing the code

` for (k in offset[i]:(offset[i + 1] - 1)) {`

y[k] ~ dnorm(mu[i], 1)

}

The danger with this method is that it relies on getting the offsets correct and they are difficult to check.

The three methods are illustrated in the small example below.

` model {`

for (i in 1:3) {

mu[i] ~ dunif(0, 100)

for (j in 1:4) {

y[i, j] ~ dnorm(mu[i], 1)

}

}

}

` y[, 1] y[, 2] y[, 3] y[, 4]`

13.2 NA NA NA

12.3 14.1 NA NA

11.0 9.7 10.3 9.6

END

`list(y = structure(.Data = c(13.2, NA, NA, NA, 12.3, 14.1, NA, NA, 11.0, 9.7, 10.3, 9.6), .Dim = c(3, 4))`

` model `

{

for (i in 1:3) {

mu[i] ~ dunif(0, 100)

}

for (k in 1:7) {

y[k] ~ dnorm(mu[person[k]], 1)

}

}

y[] person[]

13.2 1

12.3 2

14.1 2

11.0 3

9.7 3

10.3 3

9.6 3

END

list(y = c(13.2, 12.3, 14.1, 11.0, 9.7, 10.3, 9.6), person = c(1, 2, 2, 3, 3, 3, 3))

` model {`

for (i in 1:3) {

mu[i] ~ dunif(0, 100)

for (k in offset[i]:(offset[i + 1] - 1)) {

y[k] ~ dnorm(mu[i], 1)

}

}

}

y[]

13.2

12.3

14.1

11.0

9.7

10.3

9.6

END

offset[]

1

2

4

8

END

list(y = c(13.2, 12.3, 14.1, 11.0, 9.7, 10.3, 9.6), offset = c(1, 2, 4, 8))

a) when we wish to make predictions on some individuals on whom we have observed some partial data that we do not wish to use for parameter estimation;

b) when we want to use data to learn about some parameters and not others;

c) when we want evidence from one part of a model to form a prior distribution for a second part of the model, but we do not want 'feedback' from this second part.

The "

`cut`

" function forms a kind of 'valve' in the graph: prior information is allowed to flow 'downwards' through the cut, but likelihood information is prevented from flowing upwards.For example, the following code leaves the distribution for theta unchanged by the observation

`y`

.` model `

{

y <- 2

y ~ dnorm(theta.cut, 1)

theta.cut <- cut(theta)

theta ~ dnorm(0, 1)

}