Skip to content Skip to sidebar Skip to footer

Fitting Regression Multiple Times And Gather Summary Statistics

I have a dataframe that looks like this: W01 0.750000 0.916667 0.642857 1.000000 0.619565 W02 0.880000 0.944444 0.500000 0.991

Solution 1:

I think you will find this guide useful: Running a model on separate groups.

Let's generate some example data similar to yours, with values for two variants and mean age. We also need a few packages:

library(dplyr)
library(tidyr)
library(purrr)
library(broom)

set.seed(1001)
data1 <- data.frame(mean_age = sample(40:80, 50, replace = TRUE), 
                    snp01 = rnorm(50), 
                    snp02 = rnorm(50))

The first step is to transform from "wide" to "long" format using gather, so as variant names are in one column and values in another. Then we can nest by variant name.

data1 %>% 
  gather(snp, value, -mean_age) %>% 
  nest(-snp)

This creates a tibble (a special data frame) where the second column, data is a "list column" - it contains mean ages and the values for the variant in that row:

# A tibble: 2 x 2
  snp   data             
  <chr><list>           
1 snp01 <tibble [50x2]>
2 snp02 <tibble [50x2]>

Now we use purrr::map to create a third column with the linear model for each row:

data1 %>% 
  gather(snp, value, -mean_age) %>% 
  nest(-snp) %>% 
  mutate(model = map(data, ~lm(mean_age ~ value, data = .)))

Result:

# A tibble: 2 x 3
  snp   data              model 
  <chr><list><list>
1 snp01 <tibble [50x2]><lm>  
2 snp02 <tibble [50x2]><lm>

The last step is to summarise the models as desired, then unnest the data structure. I'm using broom::glance(). The full procedure:

data1 %>% 
  gather(snp, value, -mean_age) %>% 
  nest(-snp) %>% 
  mutate(model = map(data, ~lm(mean_age ~ value, data = .)), 
         summary = map(model, glance)) %>% 
  select(-data, -model) %>% 
  unnest(summary)

Result:

# A tibble: 2 x 12
  snp   r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual
  <chr><dbl><dbl><dbl><dbl><dbl><int><dbl><dbl><dbl><dbl><int>
1 snp01   0.00732      -0.0134   12.0     0.354   0.555     2  -194.  394.  400.    6901.          48
2 snp02   0.0108       -0.00981  12.0     0.524   0.473     2  -194.  394.  400.    6877.          48

Solution 2:

I do not know the exact detail and complexity of your data and analysis, but this is the approach I would take.

data <- data.frame(mean_age=rnorm(5),
                   Column_1=rnorm(5),
                   Column_2=rnorm(5),
                   Column_3=rnorm(5),
                   Column_4=rnorm(5),
                   Column_5=rnorm(5)
                   )
data


looped <- list()

for(each_col in names(data)[-1]){
    looped[[each_col]] <- lm(get(each_col) ~ mean_age, data)

}

looped

Post a Comment for "Fitting Regression Multiple Times And Gather Summary Statistics"