Overview

This file walks you through the process of building and running a Bayesian statistical model using R. There are many reasons why you might want to adopt the Bayesian framework over the null hypothesis testing (NHT) framework, but for one of my projects, I decided to try the Bayesian analysis for the following reasons:

Install and import required packages

If you don’t have any of the packages in the next chunk installed, please do so first.

Load packages

You will need the following packages for conducting Bayesian analysis.

Load data

data = read.csv("./bayesian_tutorial.csv")
glimpse(data)
## Rows: 513
## Columns: 5
## $ Participant.ID <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3…
## $ List           <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1…
## $ quantifier     <chr> "all", "some", "some", "some", "all", "some", "some", "…
## $ fruit          <chr> "banana", "orange", "apple", "banana", "orange", "banan…
## $ response       <chr> "Y", "Y", "I", "N", "Y", "I", "N", "I", "Y", "Y", "N", …

We’re using a fake dataset with the following variables (columns):

Here’s a plot summarizing the data.

We will treat quantifier as an independent variable (with Participant.ID and fruit as random effects) and build a model to see whether the type of quantifier used in the experimenter’s utterance (“some” or “all”) affected the participants’ responses.

Conduct a Bayesian analysis

Set priors

The first step in a Bayesian analysis is setting priors, which are my initial beliefs or knowledge about the parameters before observing the data. If you don’t have strong beliefs about the parameters, you should set priors that are weakly informative - priors that have only a small influence on the posterior distribution. In other words, setting weakly informative priors means that you’re allowing your data to strongly influence the posterior distribution.

We are setting weakly informative priors (Student’s t-distribution for intercept, normal distribution for others) for this analysis. You can do so with the prior function.

Note: In the code below, I needed to specify the dpar parameter (name of a distributional parameter) in addition to the class parameter, or otherwise I would get the error “The following priors do not correspond to any model parameter” - even though classes “Intercept” and “b” do exist. I don’t know how common it is (or if it’s necessary) to specify the dpar parameter, and would greatly appreciate any suggestions to set alternative ways to set priors!

priors = c(
  set_prior("student_t(3, 0, 2.5)", class = "Intercept", dpar = "muN"),
  set_prior("student_t(3, 0, 2.5)", class = "Intercept", dpar = "muY"),
  set_prior("normal(0, 8)", class = "b", coef = "quantifiersome", dpar = "muN"),
  set_prior("normal(0, 8)", class = "b", coef = "quantifiersome", dpar = "muY")
)

Build a model

Using the priors from above, here’s how to write a model. I set the three parameters (chains, iter, warmup) manually, but you can also leave them blank and default values apply. Here’s what those parameters do:

  • chains: Number of independent Markov Chain Monte Carlo (MCMC) chains (sequences of parameter values generated by the Markov chain during the simulation)
  • iter: Number of iterations for each chain, including warm-up and sampling phases.
  • warmup: Number of iterations used for warm-up. Samples drawn during this phase are typically discarded, and only the samples from the actual sampling phase are used for inference.

The total number of samples drawn from the posterior distribution is determined by the product of chains and iter.

model = brm(response ~ quantifier + (1|Participant.ID) + (1|fruit), data=data,
           family="categorical", chains=3, iter=3000, warmup=600,
           prior = priors,
           save_all_pars = TRUE)
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## using C compiler: ‘Apple clang version 14.0.3 (clang-1403.0.22.14.1)’
## using SDK: ‘MacOSX13.3.sdk’
## clang -arch arm64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/arm64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
##                ^
##                ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
##          ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000402 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 4.02 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 3000 [  0%]  (Warmup)
## Chain 1: Iteration:  300 / 3000 [ 10%]  (Warmup)
## Chain 1: Iteration:  600 / 3000 [ 20%]  (Warmup)
## Chain 1: Iteration:  601 / 3000 [ 20%]  (Sampling)
## Chain 1: Iteration:  900 / 3000 [ 30%]  (Sampling)
## Chain 1: Iteration: 1200 / 3000 [ 40%]  (Sampling)
## Chain 1: Iteration: 1500 / 3000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1800 / 3000 [ 60%]  (Sampling)
## Chain 1: Iteration: 2100 / 3000 [ 70%]  (Sampling)
## Chain 1: Iteration: 2400 / 3000 [ 80%]  (Sampling)
## Chain 1: Iteration: 2700 / 3000 [ 90%]  (Sampling)
## Chain 1: Iteration: 3000 / 3000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 4.723 seconds (Warm-up)
## Chain 1:                17.066 seconds (Sampling)
## Chain 1:                21.789 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0.000214 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 2.14 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 3000 [  0%]  (Warmup)
## Chain 2: Iteration:  300 / 3000 [ 10%]  (Warmup)
## Chain 2: Iteration:  600 / 3000 [ 20%]  (Warmup)
## Chain 2: Iteration:  601 / 3000 [ 20%]  (Sampling)
## Chain 2: Iteration:  900 / 3000 [ 30%]  (Sampling)
## Chain 2: Iteration: 1200 / 3000 [ 40%]  (Sampling)
## Chain 2: Iteration: 1500 / 3000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1800 / 3000 [ 60%]  (Sampling)
## Chain 2: Iteration: 2100 / 3000 [ 70%]  (Sampling)
## Chain 2: Iteration: 2400 / 3000 [ 80%]  (Sampling)
## Chain 2: Iteration: 2700 / 3000 [ 90%]  (Sampling)
## Chain 2: Iteration: 3000 / 3000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 4.88 seconds (Warm-up)
## Chain 2:                13.348 seconds (Sampling)
## Chain 2:                18.228 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0.000219 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 2.19 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 3000 [  0%]  (Warmup)
## Chain 3: Iteration:  300 / 3000 [ 10%]  (Warmup)
## Chain 3: Iteration:  600 / 3000 [ 20%]  (Warmup)
## Chain 3: Iteration:  601 / 3000 [ 20%]  (Sampling)
## Chain 3: Iteration:  900 / 3000 [ 30%]  (Sampling)
## Chain 3: Iteration: 1200 / 3000 [ 40%]  (Sampling)
## Chain 3: Iteration: 1500 / 3000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1800 / 3000 [ 60%]  (Sampling)
## Chain 3: Iteration: 2100 / 3000 [ 70%]  (Sampling)
## Chain 3: Iteration: 2400 / 3000 [ 80%]  (Sampling)
## Chain 3: Iteration: 2700 / 3000 [ 90%]  (Sampling)
## Chain 3: Iteration: 3000 / 3000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 5.119 seconds (Warm-up)
## Chain 3:                16.186 seconds (Sampling)
## Chain 3:                21.305 seconds (Total)
## Chain 3:
summary(model)
##  Family: categorical 
##   Links: muN = logit; muY = logit 
## Formula: response ~ quantifier + (1 | Participant.ID) + (1 | fruit) 
##    Data: data (Number of observations: 513) 
##   Draws: 3 chains, each with iter = 3000; warmup = 600; thin = 1;
##          total post-warmup draws = 7200
## 
## Group-Level Effects: 
## ~fruit (Number of levels: 3) 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(muN_Intercept)     0.69      0.95     0.01     4.11 1.05       67       31
## sd(muY_Intercept)     0.45      0.55     0.02     2.01 1.02      241     2216
## 
## ~Participant.ID (Number of levels: 57) 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(muN_Intercept)     1.31      0.25     0.87     1.83 1.03      107      369
## sd(muY_Intercept)     1.51      0.29     0.97     2.14 1.01     1685     2468
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## muN_Intercept          0.31      1.13    -1.54     3.05 1.05       52       14
## muY_Intercept          4.75      0.73     3.42     6.25 1.02      163      183
## muN_quantifiersome     0.33      0.75    -1.10     1.83 1.02      257      581
## muY_quantifiersome    -6.01      0.71    -7.39    -4.72 1.02      143      198
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

We can change priors to make them more informative, like the following:

alt_priors = c(
  set_prior("beta(4, 8)", class = "Intercept", dpar = "muN"),
  set_prior("student_t(3, 0, 2.5)", class = "Intercept", dpar = "muY"),
  set_prior("beta(4, 8)", class = "b", coef = "quantifiersome", dpar = "muN"),
  set_prior("normal(0, 8)", class = "b", coef = "quantifiersome", dpar = "muY")
)

Obtain Bayes factors

For hypothesis testing (is the use of different quantifiers lead to different patterns of response?) under the Bayesian framework, we can compare the posterior probabilities of different hypotheses by means of Bayes factor (BF). More specifically, you can evaluate the likelihood of the data under H1 that quantifier makes a difference to the likelihood of the data under H0 that it doesn’t make a difference. The threshold for whether H0 or H1 is more likely is arbitrary, just like p-value under the NHT framework. For this tutorial, we adopt the following thresholds by Jeffreys (1939/1961):

  • BF > 3: Strong evidence for H1
  • BF < 0.33: Strong evidence for H0
  • BF between 0.33 and 3: Indecisive (data supports neither H1 nor H0)

You can obtain the Bayes Factor of your variables with the bayesfactor_parameters function.

bayesfactor_parameters(model, null = 0)
## Bayes Factor (Savage-Dickey density ratio)
## 
## Parameter          |       BF
## -----------------------------
## muN_Intercept      |    0.146
## muY_Intercept      | 1.49e+05
## muN_quantifiersome |    0.092
## muY_quantifiersome | 2.66e+11
## 
## * Evidence Against The Null: 0

What this output tell us:

  • The proportions of “No” and “I don’t know” responses were the same in the “all” condition.
  • But the proportions of “Yes” and “I don’t know” responses were significantly different in the “all” condition.
  • The proportions of “No” and “I don’t know” responses were the same in the “some” condition.
  • But the proportions of “Yes” and “I don’t know” responses were significantly different in the “some” condition.

To see the overall effect of the independent variable (quantifier used), we can also compare the model we created with an intercept-only model, just like how we compare multiple linear models with anova.

m0 = brm(response ~ 1, # intercept only model
          data=data,
          family="categorical", chains=3, iter=3000, warmup=600,
          save_all_pars = TRUE)
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## using C compiler: ‘Apple clang version 14.0.3 (clang-1403.0.22.14.1)’
## using SDK: ‘MacOSX13.3.sdk’
## clang -arch arm64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/arm64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
##                ^
##                ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
##          ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000258 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.58 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 3000 [  0%]  (Warmup)
## Chain 1: Iteration:  300 / 3000 [ 10%]  (Warmup)
## Chain 1: Iteration:  600 / 3000 [ 20%]  (Warmup)
## Chain 1: Iteration:  601 / 3000 [ 20%]  (Sampling)
## Chain 1: Iteration:  900 / 3000 [ 30%]  (Sampling)
## Chain 1: Iteration: 1200 / 3000 [ 40%]  (Sampling)
## Chain 1: Iteration: 1500 / 3000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1800 / 3000 [ 60%]  (Sampling)
## Chain 1: Iteration: 2100 / 3000 [ 70%]  (Sampling)
## Chain 1: Iteration: 2400 / 3000 [ 80%]  (Sampling)
## Chain 1: Iteration: 2700 / 3000 [ 90%]  (Sampling)
## Chain 1: Iteration: 3000 / 3000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.547 seconds (Warm-up)
## Chain 1:                2.006 seconds (Sampling)
## Chain 1:                2.553 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0.000161 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.61 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 3000 [  0%]  (Warmup)
## Chain 2: Iteration:  300 / 3000 [ 10%]  (Warmup)
## Chain 2: Iteration:  600 / 3000 [ 20%]  (Warmup)
## Chain 2: Iteration:  601 / 3000 [ 20%]  (Sampling)
## Chain 2: Iteration:  900 / 3000 [ 30%]  (Sampling)
## Chain 2: Iteration: 1200 / 3000 [ 40%]  (Sampling)
## Chain 2: Iteration: 1500 / 3000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1800 / 3000 [ 60%]  (Sampling)
## Chain 2: Iteration: 2100 / 3000 [ 70%]  (Sampling)
## Chain 2: Iteration: 2400 / 3000 [ 80%]  (Sampling)
## Chain 2: Iteration: 2700 / 3000 [ 90%]  (Sampling)
## Chain 2: Iteration: 3000 / 3000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.567 seconds (Warm-up)
## Chain 2:                1.882 seconds (Sampling)
## Chain 2:                2.449 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0.000171 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 1.71 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 3000 [  0%]  (Warmup)
## Chain 3: Iteration:  300 / 3000 [ 10%]  (Warmup)
## Chain 3: Iteration:  600 / 3000 [ 20%]  (Warmup)
## Chain 3: Iteration:  601 / 3000 [ 20%]  (Sampling)
## Chain 3: Iteration:  900 / 3000 [ 30%]  (Sampling)
## Chain 3: Iteration: 1200 / 3000 [ 40%]  (Sampling)
## Chain 3: Iteration: 1500 / 3000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1800 / 3000 [ 60%]  (Sampling)
## Chain 3: Iteration: 2100 / 3000 [ 70%]  (Sampling)
## Chain 3: Iteration: 2400 / 3000 [ 80%]  (Sampling)
## Chain 3: Iteration: 2700 / 3000 [ 90%]  (Sampling)
## Chain 3: Iteration: 3000 / 3000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.564 seconds (Warm-up)
## Chain 3:                2.165 seconds (Sampling)
## Chain 3:                2.729 seconds (Total)
## Chain 3:
comparison = bayesfactor_models(model, denominator = m0)
comparison
## Bayes Factors for Model Comparison
## 
##     Model                                                 BF
## [1] quantifier + (1 | Participant.ID) + (1 | fruit) 7.40e+78
## 
## * Against Denominator: [2] (Intercept only)
## *   Bayes Factor Type: marginal likelihoods (bridgesampling)

The BF turned out to be a very large number, which is in support of H1 - the use of quantifier does affect the pattern of responses.

That’s it! I very much appreciate the feedback - if you notice any mistakes, please get in touch.

Reference

Jeffreys, Harold. 1939/1961. Theory of probability. Oxford: Oxford University Press.