I am relatively new to R and have a data frame that looks like this:
df <- structure(list(row.names = 1:5, date = c("01-01-2017", "10-01-2017",
"10-04-2017", "11-04-2017", "12-04-2017"), fixed_factor = c(NA,
3L, 2L, 5L, 10L), line_1_rec_1_mean = c(0.5, 0.1, 0.05, 0.05,
0.1), line_1_rec_2_mean = c(6, 5, 3, 2, 0.9), line_1_rec_3_mean = c(88L,
3L, 4L, 3L, 7L), line_1_rec_5_mean = c(6, 0.2, 0.7, 0.6, 3),
line_1_rec_6_mean = c(50L, 1L, 5L, 8L, 2L)), row.names = c(NA,
-5L), class = "data.frame")
row.names date fixed_factor line_1_rec_1_mean line_1_rec_2_mean line_1_rec_3_mean line_1_rec_5_mean line_1_rec_6_mean
1 1 01-01-2017 NA 0.5 6 88 6 50
2 2 10-01-2017 3 0.1 5 3 0.2 1
3 3 10-04-2017 2 0.05 3 4 0.7 5
4 4 11-04-2017 5 0.05 2 3 0.6 8
5 5 12-04-2017 10 0.1 0.9 7 3 2
The real dataframe contains over 1,500 columns and 365 rows.
What I am trying to do is to add the "fixed_factor" for each row to all "line_1_rec*" columns. Which is all columns except the first three and save the resulting data set as a new data frame which would look sth like this:
row.names date fixed_factor line_1_rec_1_mean line_1_rec_2_mean line_1_rec_3_mean line_1_rec_5_mean line_1_rec_6_mean
1 1 01-01-2017 NA 0.50 6.0 88 6.0 50
2 2 10-01-2017 3 3.10 8.0 6 3.2 4
3 3 10-04-2017 2 2.05 5.0 6 2.7 7
4 4 11-04-2017 5 5.05 7.0 8 5.6 13
5 5 12-04-2017 10 10.10 10.9 17 13.0 12
I have done alot of reading but have not managed to find a solution. Any help would be greatly appreciated.
CodePudding user response:
You can use dplyr
.
There is a way to change (or mutate
) multiple columns at once.
You can specify the relvant columns using across
.
Note that I replace NA
with 0 in fixed_factor
using coalesce
.
library(dplyr)
df %>%
mutate(across(matches("line_1_rec"), ~.x coalesce(fixed_factor, 0)))
CodePudding user response:
Try
tmp=grep("line_1_rec",colnames(df))
df[,tmp]=replace(df[,"fixed_factor"],is.na(df[,"fixed_factor"]),0) df[,tmp]
row.names date.x fixed_factor line_1_rec_1_mean line_1_rec_2_mean line_1_rec_3_mean
1 1 01-01-2017 NA 0.50 6.0 88
2 2 10-01-2017 3 3.10 8.0 6
3 3 10-04-2017 2 2.05 5.0 6
4 4 11-04-2017 5 5.05 7.0 8
5 5 12-04-2017 10 10.10 10.9 17
line_1_rec_4_mean line_1_rec_5_mean
1 6.0 50
2 3.2 4
3 2.7 7
4 5.6 13
5 13.0 12
CodePudding user response:
There is a simple way :
df$fixed_factor[is.na(df$fixed_factor)] <- 0 #replace NA values in fixed factor by 0;
df_res <- df[,c(4:ncol(df)] df$fixed_factor #Add the fixed factor
It's, I think, the most simple way to understand how works dataframe in R at the beginning.
CodePudding user response:
# Store a vector of column names of line cols:
#line1_cnames => character vector
line1_cnames <- grep("line_1_rec.*", names(df), value = TRUE)
# Don't replace NA values:
# Add the fixed factor to each line1_rec vector: res => data.frame
res <- setNames(
df$fixed_factor df[,line1_cnames],
paste(
"fixed_factor_plus",
line1_cnames,
sep = "_"
)
)
# replace NA values:
# Add the fixed factor to each line1_rec vector: res => data.frame
res <- setNames(
with(
replace(df, is.na(df), 0),
fixed_factor replace(df, is.na(df), 0)[,line1_cnames]
),
paste(
"fixed_factor_plus",
line1_cnames,
sep = "_"
)
)