Fitting Regression Multiple Times And Gather Summary Statistics
Solution 1:
I think you will find this guide useful: Running a model on separate groups.
Let's generate some example data similar to yours, with values for two variants and mean age. We also need a few packages:
library(dplyr)
library(tidyr)
library(purrr)
library(broom)
set.seed(1001)
data1 <- data.frame(mean_age = sample(40:80, 50, replace = TRUE),
snp01 = rnorm(50),
snp02 = rnorm(50))
The first step is to transform from "wide" to "long" format using gather
, so as variant names are in one column and values in another. Then we can nest
by variant name.
data1 %>%
gather(snp, value, -mean_age) %>%
nest(-snp)
This creates a tibble (a special data frame) where the second column, data
is a "list column" - it contains mean ages and the values for the variant in that row:
# A tibble: 2 x 2
snp data
<chr><list>
1 snp01 <tibble [50x2]>
2 snp02 <tibble [50x2]>
Now we use purrr::map
to create a third column with the linear model for each row:
data1 %>%
gather(snp, value, -mean_age) %>%
nest(-snp) %>%
mutate(model = map(data, ~lm(mean_age ~ value, data = .)))
Result:
# A tibble: 2 x 3
snp data model
<chr><list><list>
1 snp01 <tibble [50x2]><lm>
2 snp02 <tibble [50x2]><lm>
The last step is to summarise the models as desired, then unnest
the data structure. I'm using broom::glance()
. The full procedure:
data1 %>%
gather(snp, value, -mean_age) %>%
nest(-snp) %>%
mutate(model = map(data, ~lm(mean_age ~ value, data = .)),
summary = map(model, glance)) %>%
select(-data, -model) %>%
unnest(summary)
Result:
# A tibble: 2 x 12
snp r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
<chr><dbl><dbl><dbl><dbl><dbl><int><dbl><dbl><dbl><dbl><int>
1 snp01 0.00732 -0.0134 12.0 0.354 0.555 2 -194. 394. 400. 6901. 48
2 snp02 0.0108 -0.00981 12.0 0.524 0.473 2 -194. 394. 400. 6877. 48
Solution 2:
I do not know the exact detail and complexity of your data and analysis, but this is the approach I would take.
data <- data.frame(mean_age=rnorm(5),
Column_1=rnorm(5),
Column_2=rnorm(5),
Column_3=rnorm(5),
Column_4=rnorm(5),
Column_5=rnorm(5)
)
data
looped <- list()
for(each_col in names(data)[-1]){
looped[[each_col]] <- lm(get(each_col) ~ mean_age, data)
}
looped
Post a Comment for "Fitting Regression Multiple Times And Gather Summary Statistics"