Home > Back-end >  Row bind multiple columns into two columns averaging unique elements in order
Row bind multiple columns into two columns averaging unique elements in order

Time:01-27

I'm still learning R and was wondering if I there was an elegant way of manipulating the below df to achieve df2.

I'm not sure if it's a loop that is supposed to be used for this, but basically I want to take a distinct on each V(X)_ID and average its associated V(X)_No columns.

V1_ID <- c('AUD','CAD','NZD','USD',NA,NA,NA)
V1_No <- c(3.43,2.42,1.58,9.9,NA,NA,NA)
V2_ID <- c('TRY','AUD','EUR','SPY','TRY','BTC',NA)
V2_No <- c(8.4,2.4,6.8,1.2,9.8,9.8,NA)
V3_ID <- c('JPY','EUR','NZD','AUD','SPY','NA',NA)
V3_No <- c(1.8,8.6,4.4,2.1,9.6,NA,NA)
V4_ID <- c('GBP','TRY','HKD','SKD','USD','NZD','CAD')
V4_No <- c(1.3,4.6,7.9,8.5,2.6,7.4,9.1)
df <- data.frame(V1_ID,V1_No,V2_ID,V2_No,V3_ID,V3_No,V4_ID,V4_No)


ID <- c('AUD','CAD','NZD','USD','TRY','EUR','SPY','BTC','JPY','GBP','SKD')
No <- c(2.643,5.76,4.46,6.25,7.6,8.6,5.4,9.8,1.8,1.3,8.5)
df2 <- data.frame(ID,No)

Your assistance is much appreciated as I have hundreds of these types of columns in this type of format and approaching it from a manual standpoint is very draining.

Thanks

CodePudding user response:

One way to do this using some functions from the tidyverse:

library(tidyverse)

paste0("V", 1:4) %>%
  map(~select(df, starts_with(.x))) %>% # split into list so that bind_rows() is easier
  map(~rename(.x, v_id = 1, v_no = 2)) %>% # rename columns so bind_rows() combines all columns into two
  bind_rows() %>% 
  drop_na() %>% # remove missing values
  group_by(v_id) %>%
  summarize(v_no = mean(v_no)) # calculate average by v_id

Since you say you have hundreds of these columns, you'll have to change paste0("V", 1:4) to match the number of columns you have. E.g., if you have 200 of them, you would just write paste0("V", 1:200).

CodePudding user response:

A tidyverse solution that pivots to get all values in correct columns first:

library(tidyverse)

df |> 
  rownames_to_column() |> 
  pivot_longer(-rowname, names_to = c("run", "metric"), values_to = "val",
               names_pattern = "V(.)_(..)", values_transform = as.character) |> 
  pivot_wider(id_cols = c(run, rowname), names_from = metric, values_from = val) |> 
  filter(!is.na(No)) |>
  group_by(ID) |> 
  summarise(No = mean(as.double(No)))
#> # A tibble: 12 × 2
#>    ID       No
#>    <chr> <dbl>
#>  1 AUD    2.64
#>  2 BTC    9.8 
#>  3 CAD    5.76
#>  4 EUR    7.7 
#>  5 GBP    1.3 
#>  6 HKD    7.9 
#>  7 JPY    1.8 
#>  8 NZD    4.46
#>  9 SKD    8.5 
#> 10 SPY    5.4 
#> 11 TRY    7.6 
#> 12 USD    6.25
  • Related