Chi-Square prop testing

Last updated on Oct 25, 2024 tidyverse

Table of Contents

I’ve played with running the prop.test() chi-square tests across tibbles and I think I’ve finally got an efficient way of doing it.

Simple prop testing

First, let’s look at a simple example. I have survey data from a control/exposed lift study and I want to see if the lift for each question is significant. The results are in a named list with a series of tidy tibbles.

result_list[1:3]

## $`I fully trust`
## # A tibble: 2 × 5
##   matched_control_xmedia svy_q  proportion     n total
##   <chr>                  <chr>       <dbl> <dbl> <dbl>
## 1 control                Brand1      0.339   339   998
## 2 test                   Brand1      0.431  1029  2389
## 
## $`I can fully identify with`
## # A tibble: 2 × 5
##   matched_control_xmedia svy_q  proportion     n total
##   <chr>                  <chr>       <dbl> <dbl> <dbl>
## 1 control                Brand1      0.289   288   998
## 2 test                   Brand1      0.385   919  2389
## 
## $`I really like`
## # A tibble: 2 × 5
##   matched_control_xmedia svy_q  proportion     n total
##   <chr>                  <chr>       <dbl> <dbl> <dbl>
## 1 control                Brand1      0.388   387   998
## 2 test                   Brand1      0.479  1144  2389

Structured like this, I can easily just use pivot_wider() to spread the data and then run the prop.test() function. I’ll write a function to do that .

result_prop_test <- function(res) {
  tmpout <- res |> 
    tidyr::pivot_wider(
      names_from = !!rlang::sym(names(res)[1]), 
      values_from = c(proportion, n, total)
    ) |> 
    dplyr::filter(!is.na(proportion_control)) |> 
    dplyr::filter(!is.na(proportion_test)) |> 
    dplyr::rowwise() |> 
    dplyr::mutate(
      prop_test = list(prop.test(c(n_control, n_test), c(total_control, total_test))),
      p_value = prop_test$p.value,  # Extract the p-value
      statistic = prop_test$statistic # Extract the chi-squared statistic
    ) |> 
    dplyr::mutate(lift = round((proportion_test - proportion_control)*100), 
                  sig_level = dplyr::case_when(p_value <= .05 ~ .95,
                                 p_value <= .1 & p_value >.05 ~ .90,
                                 p_value < .2 & p_value > .1 ~ .80)) 
  return(tmpout)
}

Now if we take a single tibble from our list and run it through our function, we can see that the results are now in one line and the p_value and statistic of the prop.test are in their own separate columns. The full results of the prop.test are also in a list column.

res1 <- result_prop_test(result_list[[1]])
res1

## # A tibble: 1 × 12
## # Rowwise: 
##   svy_q  proportion_control proportion_test n_control n_test total_control
##   <chr>               <dbl>           <dbl>     <dbl>  <dbl>         <dbl>
## 1 Brand1              0.339           0.431       339   1029           998
## # ℹ 6 more variables: total_test <dbl>, prop_test <list>, p_value <dbl>,
## #   statistic <dbl>, lift <dbl>, sig_level <dbl>

I can also just run the entire list through the function. It returns the same list format, but I can use data.table::rbindlist() to put into a tibble.

sig_results <- purrr::map(result_list, result_prop_test)

data.table::rbindlist(sig_results, idcol = "question") |> 
  as_tibble() |> 
  select(question, svy_q, contains("proportion"), lift, sig_level)

## # A tibble: 9 × 6
##   question              svy_q proportion_control proportion_test  lift sig_level
##   <chr>                 <chr>              <dbl>           <dbl> <dbl>     <dbl>
## 1 I fully trust         Bran…              0.339           0.431     9      0.95
## 2 I can fully identify… Bran…              0.289           0.385    10      0.95
## 3 I really like         Bran…              0.388           0.479     9      0.95
## 4 I would like to own   Bran…              0.395           0.487     9      0.95
## 5 I would be willing t… Bran…              0.262           0.383    12      0.95
## 6 Is leading in electr… Bran…              0.187           0.246     6      0.95
## 7 Is leading in digita… Bran…              0.297           0.388     9      0.95
## 8 Is leading in sustai… Bran…              0.231           0.319     9      0.95
## 9 Is a brand that will… Bran…              0.503           0.564     6      0.95

Prop testing across all options

Now for something a little more involved. In the same survey that gave us the control/exposed, we also have questions asked for the entire competitive set. The data is currently in a tidy frame that looks like this:

head(df, 10)

## # A tibble: 10 × 6
##    Category           svy_q  proportion     n total cat          
##    <fct>              <chr>       <dbl> <dbl> <dbl> <chr>        
##  1 Aided Awareness    Brand1      0.959  3427  3574 Brand Metrics
##  2 Aided Awareness    Brand2      0.931  3329  3574 Brand Metrics
##  3 Aided Awareness    Brand3      0.909  3250  3574 Brand Metrics
##  4 Aided Awareness    Brand4      0.935  3340  3574 Brand Metrics
##  5 Aided Awareness    Brand5      0.936  3345  3574 Brand Metrics
##  6 Aided Ad Awareness Brand1      0.45   1543  3574 Brand Metrics
##  7 Aided Ad Awareness Brand2      0.392  1306  3574 Brand Metrics
##  8 Aided Ad Awareness Brand3      0.428  1392  3574 Brand Metrics
##  9 Aided Ad Awareness Brand4      0.408  1364  3574 Brand Metrics
## 10 Aided Ad Awareness Brand5      0.354  1185  3574 Brand Metrics

I want to compare the results for each Category question for each brand against all others. So the function below will do that for me. It also gives each brand an id and those id’s are then used to determine which brands have significantly greater results compared to the rest.

proptest_dataframe <- function(df, significance_level = 0.1) {
  df <- df %>%
    dplyr::mutate(id = LETTERS[1:n()])  # Assign capital letters as IDs
  
  run_prop_test_significant <- function(i, j, df) {
    prop1 <- df$proportion[i]
    prop2 <- df$proportion[j]
    
    n1 <- df$n[i]
    n2 <- df$n[j]
    
    total1 <- df$total[i]
    total2 <- df$total[j]
    
    counts <- c(n1, n2)
    totals <- c(total1, total2)
    
    test_result <- prop.test(counts, totals)
    
    return(test_result$p.value < significance_level && prop1 > prop2)
  }
  
  df$greater_than <- NA
  
  for (i in 1:nrow(df)) {
    greater_ids <- c()
    for (j in 1:nrow(df)) {
      if (i != j && run_prop_test_significant(i, j, df)) {
        greater_ids <- c(greater_ids, df$id[j])
      }
    }
    df$greater_than[i] <- paste(greater_ids, collapse = ", ")
  }
  return(df)
}

proptest_dataframe(df[1:5,])

## # A tibble: 5 × 8
##   Category        svy_q  proportion     n total cat           id    greater_than
##   <fct>           <chr>       <dbl> <dbl> <dbl> <chr>         <chr> <chr>       
## 1 Aided Awareness Brand1      0.959  3427  3574 Brand Metrics A     "B, C, D, E"
## 2 Aided Awareness Brand2      0.931  3329  3574 Brand Metrics B     "C"         
## 3 Aided Awareness Brand3      0.909  3250  3574 Brand Metrics C     ""          
## 4 Aided Awareness Brand4      0.935  3340  3574 Brand Metrics D     "C"         
## 5 Aided Awareness Brand5      0.936  3345  3574 Brand Metrics E     "C"

So this works, but because my dataframe has multiple questions included, I can’t just pass the whole thing in. Every row will be treated as if it’s all part of one, big prop.test.

To fix this, I’m going to split my data by the Category question and then I’m going to run it through this function.

df_list <- split(df, df$Category)

results <- df_list |> 
    purrr::map(proptest_dataframe)

results

## $`Aided Ad Awareness`
## # A tibble: 5 × 8
##   Category           svy_q  proportion     n total cat        id    greater_than
##   <fct>              <chr>       <dbl> <dbl> <dbl> <chr>      <chr> <chr>       
## 1 Aided Ad Awareness Brand1      0.45   1543  3574 Brand Met… A     "B, C, D, E"
## 2 Aided Ad Awareness Brand2      0.392  1306  3574 Brand Met… B     "E"         
## 3 Aided Ad Awareness Brand3      0.428  1392  3574 Brand Met… C     "B, E"      
## 4 Aided Ad Awareness Brand4      0.408  1364  3574 Brand Met… D     "E"         
## 5 Aided Ad Awareness Brand5      0.354  1185  3574 Brand Met… E     ""          
## 
## $`Aided Awareness`
## # A tibble: 5 × 8
##   Category        svy_q  proportion     n total cat           id    greater_than
##   <fct>           <chr>       <dbl> <dbl> <dbl> <chr>         <chr> <chr>       
## 1 Aided Awareness Brand1      0.959  3427  3574 Brand Metrics A     "B, C, D, E"
## 2 Aided Awareness Brand2      0.931  3329  3574 Brand Metrics B     "C"         
## 3 Aided Awareness Brand3      0.909  3250  3574 Brand Metrics C     ""          
## 4 Aided Awareness Brand4      0.935  3340  3574 Brand Metrics D     "C"         
## 5 Aided Awareness Brand5      0.936  3345  3574 Brand Metrics E     "C"         
## 
## $`Brand Momentum - Top 2 Box`
## # A tibble: 5 × 8
##   Category                 svy_q proportion     n total cat   id    greater_than
##   <fct>                    <chr>      <dbl> <dbl> <dbl> <chr> <chr> <chr>       
## 1 Brand Momentum - Top 2 … Bran…      0.556  1905  3574 Bran… A     "D"         
## 2 Brand Momentum - Top 2 … Bran…      0.579  1926  3574 Bran… B     "C, D"      
## 3 Brand Momentum - Top 2 … Bran…      0.559  1817  3574 Bran… C     "A"         
## 4 Brand Momentum - Top 2 … Bran…      0.53   1770  3574 Bran… D     ""          
## 5 Brand Momentum - Top 2 … Bran…      0.594  1987  3574 Bran… E     "A, C, D"   
## 
## $`Purchase Consideration`
## # A tibble: 5 × 8
##   Category               svy_q  proportion     n total cat    id    greater_than
##   <fct>                  <chr>       <dbl> <dbl> <dbl> <chr>  <chr> <chr>       
## 1 Purchase Consideration Brand1      0.498  1706  3574 Brand… A     "D, E"      
## 2 Purchase Consideration Brand2      0.499  1662  3574 Brand… B     "D, E"      
## 3 Purchase Consideration Brand3      0.518  1682  3574 Brand… C     "D, E"      
## 4 Purchase Consideration Brand4      0.443  1480  3574 Brand… D     ""          
## 5 Purchase Consideration Brand5      0.434  1452  3574 Brand… E     ""

Again, everything works as it should. But I also want to present these results to someone, and right now, the results are in a list of tidy tibbles. I’m going to add to my current proptest_dataframe() function to widen the data and also to do the prep work necessary to put these results into a gt table.

proptest_dataframe <- function(df, significance_level = 0.1) {
  df <- df %>%
    dplyr::mutate(id = LETTERS[1:n()])  # Assign capital letters as IDs
  
  run_prop_test_significant <- function(i, j, df) {
    prop1 <- df$proportion[i]
    prop2 <- df$proportion[j]
    
    n1 <- df$n[i]
    n2 <- df$n[j]
    
    total1 <- df$total[i]
    total2 <- df$total[j]
    
    counts <- c(n1, n2)
    totals <- c(total1, total2)
    
    test_result <- prop.test(counts, totals)
    
    return(test_result$p.value < significance_level && prop1 > prop2)
  }
  
  df$greater_than <- NA
  
  for (i in 1:nrow(df)) {
    greater_ids <- c()
    for (j in 1:nrow(df)) {
      if (i != j && run_prop_test_significant(i, j, df)) {
        greater_ids <- c(greater_ids, df$id[j])
      }
    }
    df$greater_than[i] <- paste(greater_ids, collapse = ", ")
  }
  
  # here i'm going to subscript my `greater_than` to show significance
  df <- df |> 
    mutate(
      brand_id = glue("{svy_q} {id}"),  
      proportion_with_sub = ifelse(
        greater_than != "", 
        glue("{round(proportion * 100)}% <sub>({greater_than})</sub>"),  
        glue("{round(proportion * 100)}%")
      )
    )
  
  # widen the data
  df_wide <- df |> 
    dplyr::select(Category, cat, brand_id, proportion_with_sub) |> 
    tidyr::pivot_wider(
      names_from = brand_id, 
      values_from = proportion_with_sub,
      values_fn = list(proportion_with_sub = function(x) paste(x, collapse = ""))
    )
  
  return(df_wide)
}

df_list <- split(df, df$Category)

results <- df_list |> 
    purrr::map(proptest_dataframe)

results

## $`Aided Ad Awareness`
## # A tibble: 1 × 7
##   Category          cat   `Brand1 A` `Brand2 B` `Brand3 C` `Brand4 D` `Brand5 E`
##   <fct>             <chr> <chr>      <chr>      <chr>      <chr>      <chr>     
## 1 Aided Ad Awarene… Bran… 45% <sub>… 39% <sub>… 43% <sub>… 41% <sub>… 35%       
## 
## $`Aided Awareness`
## # A tibble: 1 × 7
##   Category        cat     `Brand1 A` `Brand2 B` `Brand3 C` `Brand4 D` `Brand5 E`
##   <fct>           <chr>   <chr>      <chr>      <chr>      <chr>      <chr>     
## 1 Aided Awareness Brand … 96% <sub>… 93% <sub>… 91%        94% <sub>… 94% <sub>…
## 
## $`Brand Momentum - Top 2 Box`
## # A tibble: 1 × 7
##   Category          cat   `Brand1 A` `Brand2 B` `Brand3 C` `Brand4 D` `Brand5 E`
##   <fct>             <chr> <chr>      <chr>      <chr>      <chr>      <chr>     
## 1 Brand Momentum -… Bran… 56% <sub>… 58% <sub>… 56% <sub>… 53%        59% <sub>…
## 
## $`Purchase Consideration`
## # A tibble: 1 × 7
##   Category          cat   `Brand1 A` `Brand2 B` `Brand3 C` `Brand4 D` `Brand5 E`
##   <fct>             <chr> <chr>      <chr>      <chr>      <chr>      <chr>     
## 1 Purchase Conside… Bran… 50% <sub>… 50% <sub>… 52% <sub>… 44%        43%

Note that the column names have a capital letter next to them; that way each brand is readily identifiable for the reader of the table. Now we’re going to bind our data into a single tibble, and then we’re going to pass it to gt to make a table.

combined_result <- results |> 
    data.table::rbindlist(fill = TRUE) |> 
    tibble::as_tibble() |> 
    dplyr::mutate(Category = as.character(Category))

library(gt)
  
  # Programmatically generate the labels for the columns (with <br>)
  col_labels <- combined_result |> 
    colnames() |> 
    set_names() |>   # Keeps the original column names as-is
    map(~ html(glue("{gsub(' ', '<br>', .)}")))  # Programmatically replace space with <br> in labels
  
  # Create the gt table with programmatically generated column labels and subscripting
  combined_result |> 
    group_by(cat) |> 
    gt() |> 
    tab_header(
      title = "Competitive summary",
      ) |> 
    cols_label(.list = col_labels) |> 
    fmt_markdown(columns = everything()) |> 
    # bolding groups and columns 
    tab_style(
      style = cell_text(weight = 'bold',
                        size = px(13)),
      locations = list(
        cells_column_labels(
          columns = everything()),
        cells_row_groups(groups = TRUE))
    ) |> 
    gt::tab_footnote(
      footnote = "A footnote goes here to explain the table",
      locations = gt::cells_title(groups = "title")
    ) |> 
    # aligning the title left
    gt::tab_style(
      style = gt::cell_text(align = 'left',
                            weight = 'bold',
                            size = gt::px(16)),
      locations = gt::cells_title(c("title"))
    ) |> 
    # aligning the subtitle left
    gt::tab_style(
      style = gt::cell_text(align = 'left',
                            weight = 'bold',
                            size = gt::px(14)),
      locations = gt::cells_title(c("subtitle")) 
    ) |> 
    # centering column labels (not first column)
    gt::tab_style(
      style = gt::cell_text(align = 'center'),
      locations = gt::cells_column_labels(
        columns = -1  # Exclude the first column
      )
    ) |>
    # font size in table
    gt::tab_style(
      style = gt::cell_text(size = px(12)),
      locations = cells_body(
        columns = dplyr::everything())
    ) |> 
    # centering all variables in table (not first column)
    gt::tab_style(
      style = gt::cell_text(align = "center"),
      locations = gt::cells_body(columns = -1)
    ) |> 
    # don't have Category column labeled
    cols_label(
      Category = ""
    ) |> 
    # final options
    gt::tab_options(
      data_row.padding = gt::px(4),
      row_group.padding = gt::px(4),
      source_notes.font.size = gt::px(10),
      footnotes.font.size = gt::px(10),
      footnotes.marks = "" # empty footnote mark
    )

	Brand1 A	Brand2 B	Brand3 C	Brand4 D	Brand5 E
Competitive summary
Brand Metrics
Aided Ad Awareness	45% _{(B, C, D, E)}	39% _(E)	43% _{(B, E)}	41% _(E)	35%
Aided Awareness	96% _{(B, C, D, E)}	93% _(C)	91%	94% _(C)	94% _(C)
Brand Momentum - Top 2 Box	56% _(D)	58% _{(C, D)}	56% _(A)	53%	59% _{(A, C, D)}
Purchase Consideration	50% _{(D, E)}	50% _{(D, E)}	52% _{(D, E)}	44%	43%
A footnote goes here to explain the table

tidyverse

Chi-Square prop testing

Simple prop testing

Prop testing across all options

Taylor Grant

Group Director, Strategy & Analytics

Related