I'm still learning R and was wondering if I there was an elegant way of manipulating the below df to achieve df2.
I'm not sure if it's a loop that is supposed to be used for this, but basically I want to take a distinct on each V(X)_ID and average its associated V(X)_No columns.
V1_ID <- c('AUD','CAD','NZD','USD',NA,NA,NA)
V1_No <- c(3.43,2.42,1.58,9.9,NA,NA,NA)
V2_ID <- c('TRY','AUD','EUR','SPY','TRY','BTC',NA)
V2_No <- c(8.4,2.4,6.8,1.2,9.8,9.8,NA)
V3_ID <- c('JPY','EUR','NZD','AUD','SPY','NA',NA)
V3_No <- c(1.8,8.6,4.4,2.1,9.6,NA,NA)
V4_ID <- c('GBP','TRY','HKD','SKD','USD','NZD','CAD')
V4_No <- c(1.3,4.6,7.9,8.5,2.6,7.4,9.1)
df <- data.frame(V1_ID,V1_No,V2_ID,V2_No,V3_ID,V3_No,V4_ID,V4_No)
ID <- c('AUD','CAD','NZD','USD','TRY','EUR','SPY','BTC','JPY','GBP','SKD')
No <- c(2.643,5.76,4.46,6.25,7.6,8.6,5.4,9.8,1.8,1.3,8.5)
df2 <- data.frame(ID,No)
Your assistance is much appreciated as I have hundreds of these types of columns in this type of format and approaching it from a manual standpoint is very draining.
Thanks
CodePudding user response:
One way to do this using some functions from the tidyverse
:
library(tidyverse)
paste0("V", 1:4) %>%
map(~select(df, starts_with(.x))) %>% # split into list so that bind_rows() is easier
map(~rename(.x, v_id = 1, v_no = 2)) %>% # rename columns so bind_rows() combines all columns into two
bind_rows() %>%
drop_na() %>% # remove missing values
group_by(v_id) %>%
summarize(v_no = mean(v_no)) # calculate average by v_id
Since you say you have hundreds of these columns, you'll have to change paste0("V", 1:4)
to match the number of columns you have. E.g., if you have 200 of them, you would just write paste0("V", 1:200)
.
CodePudding user response:
A tidyverse solution that pivots to get all values in correct columns first:
library(tidyverse)
df |>
rownames_to_column() |>
pivot_longer(-rowname, names_to = c("run", "metric"), values_to = "val",
names_pattern = "V(.)_(..)", values_transform = as.character) |>
pivot_wider(id_cols = c(run, rowname), names_from = metric, values_from = val) |>
filter(!is.na(No)) |>
group_by(ID) |>
summarise(No = mean(as.double(No)))
#> # A tibble: 12 × 2
#> ID No
#> <chr> <dbl>
#> 1 AUD 2.64
#> 2 BTC 9.8
#> 3 CAD 5.76
#> 4 EUR 7.7
#> 5 GBP 1.3
#> 6 HKD 7.9
#> 7 JPY 1.8
#> 8 NZD 4.46
#> 9 SKD 8.5
#> 10 SPY 5.4
#> 11 TRY 7.6
#> 12 USD 6.25