I am trying to add an empty row before every specific value (intercept):
(I put three linear regression model summaries into a dataframe, I want to use NA to make the dataframe looks better)
For example, my dataframe is like this.
var coefficient p_Value
(intercept) -17,22 0.2
speed 3.82 0.001
(intercept) -172,23 0.02
youtube 13.42 0.001
facebook 5.44 0.5
(intercept) 3.22 0.02
youtube 4.98 0.001
facebook 4.33 0.5
newspaper 1.22 0.11
I want result like this:
var coefficient p_Value
(intercept) -17,22 0.2
speed 3.82 0.001
NA NA NA
(intercept) -172,23 0.02
youtube 13.42 0.001
facebook 5.44 0.5
NA NA NA
(intercept) 3.22 0.02
youtube 4.98 0.001
facebook 4.33 0.5
newspaper 1.22 0.11
I know I could hard code empty rows based on the row locations, but I am looking for a better way. Instead of hard coding, I might have a much more complex and more extended data frame in the future. I do not want to split it into different list or separate dataframe, because eventually I will write this dataframe to csv, so that with NA I could easily see different models by only read csv.
Thank you for your time.
CodePudding user response:
And here's a tidyverse approach (updated to get rid of the last NA row)
library(tidyverse)
df |>
mutate(split = cumsum(ifelse(var == "(intercept)", 1, 0))) |>
group_by(split) |>
group_modify(.f = ~add_row(.data = .,
var = NA_character_)) |>
ungroup() |>
slice(-n())
# A tibble: 11 × 4
split var coefficient p_Value
<dbl> <chr> <chr> <dbl>
1 1 (intercept) -17,22 0.2
2 1 speed 3.82 0.001
3 1 NA NA NA
4 2 (intercept) -172,23 0.02
5 2 youtube 13.42 0.001
6 2 facebook 5.44 0.5
7 2 NA NA NA
8 3 (intercept) 3.22 0.02
9 3 youtube 4.98 0.001
10 3 facebook 4.33 0.5
11 3 newspaper 1.22 0.11
CodePudding user response:
Here is a base R option using split
rbind
> head(do.call(rbind, lapply(split(df, cumsum(startsWith(df$var, "("))), rbind, NA)), -1)
var coefficient p_Value
1.1 (intercept) -17,22 0.200
1.2 speed 3.82 0.001
1.3 <NA> <NA> NA
2.3 (intercept) -172,23 0.020
2.4 youtube 13.42 0.001
2.5 facebook 5.44 0.500
2.41 <NA> <NA> NA
3.6 (intercept) 3.22 0.020
3.7 youtube 4.98 0.001
3.8 facebook 4.33 0.500
3.9 newspaper 1.22 0.110
Data
df <- structure(list(var = c(
"(intercept)", "speed", "(intercept)",
"youtube", "facebook", "(intercept)", "youtube", "facebook",
"newspaper"
), coefficient = c(
"-17,22", "3.82", "-172,23", "13.42",
"5.44", "3.22", "4.98", "4.33", "1.22"
), p_Value = c(
0.2, 0.001,
0.02, 0.001, 0.5, 0.02, 0.001, 0.5, 0.11
)), class = "data.frame", row.names = c(
NA,
-9L
))
CodePudding user response:
This should be more efficient, as there is only a single loop through columns.
## separate an atomic vector `x` by an NA before `x[i]`
NAsep <- function (x, i) {
y <- vector(mode(x), length(x) length(i))
NAind <- i seq(0, length(i) - 1)
y[NAind] <- NA
y[-NAind] <- x
y
}
data.frame(lapply(df, NAsep, i = which(df$var == "(intercept)")[-1]))
# var coefficient p_Value
#1 (intercept) -17,22 0.200
#2 speed 3.82 0.001
#3 <NA> <NA> NA
#4 (intercept) -172,23 0.020
#5 youtube 13.42 0.001
#6 facebook 5.44 0.500
#7 <NA> <NA> NA
#8 (intercept) 3.22 0.020
#9 youtube 4.98 0.001
#10 facebook 4.33 0.500
#11 newspaper 1.22 0.110