Fitting Regression Multiple Times And Gather Summary Statistics
Solution 1:
I think you will find this guide useful: Running a model on separate groups.
Let's generate some example data similar to yours, with values for two variants and mean age. We also need a few packages:
library(dplyr)
library(tidyr)
library(purrr)
library(broom)
set.seed(1001)
data1 <- data.frame(mean_age = sample(40:80, 50, replace = TRUE),
snp01 = rnorm(50),
snp02 = rnorm(50))
The first step is to transform from "wide" to "long" format using gather, so as variant names are in one column and values in another. Then we can nest by variant name.
data1 %>%
gather(snp, value, -mean_age) %>%
nest(-snp)
This creates a tibble (a special data frame) where the second column, data is a "list column" - it contains mean ages and the values for the variant in that row:
# A tibble: 2 x 2
snp data
<chr><list>
1 snp01 <tibble [50x2]>
2 snp02 <tibble [50x2]>Now we use purrr::map to create a third column with the linear model for each row:
data1 %>%
gather(snp, value, -mean_age) %>%
nest(-snp) %>%
mutate(model = map(data, ~lm(mean_age ~ value, data = .)))
Result:
# A tibble: 2 x 3
snp data model
<chr><list><list>
1 snp01 <tibble [50x2]><lm>
2 snp02 <tibble [50x2]><lm>The last step is to summarise the models as desired, then unnest the data structure. I'm using broom::glance(). The full procedure:
data1 %>%
gather(snp, value, -mean_age) %>%
nest(-snp) %>%
mutate(model = map(data, ~lm(mean_age ~ value, data = .)),
summary = map(model, glance)) %>%
select(-data, -model) %>%
unnest(summary)
Result:
# A tibble: 2 x 12
snp r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
<chr><dbl><dbl><dbl><dbl><dbl><int><dbl><dbl><dbl><dbl><int>
1 snp01 0.00732 -0.0134 12.0 0.354 0.555 2 -194. 394. 400. 6901. 48
2 snp02 0.0108 -0.00981 12.0 0.524 0.473 2 -194. 394. 400. 6877. 48
Solution 2:
I do not know the exact detail and complexity of your data and analysis, but this is the approach I would take.
data <- data.frame(mean_age=rnorm(5),
Column_1=rnorm(5),
Column_2=rnorm(5),
Column_3=rnorm(5),
Column_4=rnorm(5),
Column_5=rnorm(5)
)
data
looped <- list()
for(each_col in names(data)[-1]){
looped[[each_col]] <- lm(get(each_col) ~ mean_age, data)
}
looped
Post a Comment for "Fitting Regression Multiple Times And Gather Summary Statistics"