How to transform selected rows into a single column in R-CodePudding

I have a data frame which I need to transform. I need to change the unique rows into single columns based on the value of a column.

My data below:

df1 <- data.frame(V1 = c("a", "a", "b", "b","b"), 
                  V2 = c("product1", "transport", "product1", "product2","transport"),
                  V3 = c("100", "10", "100", "100","10"))

> df1
  V1        V2  V3
1  a  product1 100
2  a transport  10
3  b  product1 100
4  b  product2 100
5  b transport  10

I need the following transformation and the division of the value of V3 into the number of products included in V1.

> df2
  V1       V2 transport  V3
1  a product1        10 100
2  b product1         5 100
3  b product2         5 100

CodePudding user response：

Here is one method with data.table - convert to data.table (setDT), make sure that the 'V3' is numeric (for division - it was created as character), grouped by 'V1', create the 'transport' by extracting the 'V3' value where 'V2' is 'transport' and divide by the number of elements in 'V2' that are not "transport", then subset the data by removing the 'transport' elements from 'V2'

library(data.table)
df1$V3 <- as.numeric(df1$V3)
setDT(df1)[, transport := V3[V2 == "transport"]/
       sum(V2 != "transport"), by = V1]
df1[V2 != "transport"]
       V1       V2    V3 transport
   <char>   <char> <num>     <num>
1:      a product1   100        10
2:      b product1   100         5
3:      b product2   100         5

Or another option with dplyr/tidyr

library(dplyr)
library(tidyr)
df1 %>%
   type.convert(as.is = TRUE) %>% 
   mutate(transport = case_when(V2 == 'transport' ~ V3)) %>% 
   group_by(V1) %>%
   fill(transport, .direction = "downup") %>%
   mutate(transport = transport/sum(V2 != "transport")) %>% 
   ungroup %>% 
   filter(V2 != "transport")
# A tibble: 3 × 4
  V1    V2          V3 transport
  <chr> <chr>    <int>     <dbl>
1 a     product1   100        10
2 b     product1   100         5
3 b     product2   100         5

CodePudding user response：

Here's a tidyverse-based solution, using tidyr and dplyr:

convert V3 to numeric
pivot wider to create a separate column for transport, then pivot longer to collapse "product1" and "product2" back into one column
divide transport by number of cases per V1.

library(dplyr)
library(tidyr)

df1 %>% 
  mutate(V3 = as.numeric(V3)) %>% 
  pivot_wider(names_from = V2, values_from = V3) %>% 
  pivot_longer(
    cols = c(product1, product2), 
    names_to = "V2", 
    values_to = "V3", 
    values_drop_na = TRUE
  ) %>% 
  group_by(V1) %>% 
  mutate(transport = transport / n()) %>% 
  ungroup()

#> # A tibble: 3 x 4
#>   V1    transport V2          V3
#>   <chr>     <dbl> <chr>    <dbl>
#> 1 a            10 product1   100
#> 2 b             5 product1   100
#> 3 b             5 product2   100

^{Created on 2022-03-17 by the reprex package (v2.0.1)}

CodePudding user response：

Here is an usual approach to get your desired output if I got your point right:

library(dplyr)
library(tidyr)

df1 %>%
  group_by(V1) %>%
  mutate(V3 = ifelse(V2 == "transport", as.numeric(V3) / (n() - 1), as.numeric(V3))) %>%
  ungroup() %>%
  pivot_wider(names_from = V2, values_from = V3) %>%
  pivot_longer(c(product1, product2), names_to = "V2", values_to = "V3") %>%
  drop_na()

# A tibble: 3 x 4
  V1    transport V2          V3
  <chr>     <dbl> <chr>    <dbl>
1 a            10 product1   100
2 b             5 product1   100
3 b             5 product2   100

CodePudding user response：

Another possible solution:

library(tidyverse)

df1 %>%
  mutate(V3 = as.numeric(V3)) %>% 
  group_by(V1) %>% 
  mutate(transport = if_else(V2 == "transport", V3 / (n()-1), NA_real_)) %>% 
  fill(transport, .direction = "up") %>% ungroup %>% 
  filter(V2 != "transport")

#> # A tibble: 3 × 4
#>   V1    V2          V3 transport
#>   <chr> <chr>    <dbl>     <dbl>
#> 1 a     product1   100        10
#> 2 b     product1   100         5
#> 3 b     product2   100         5

CodePudding user response：

Here is one more with pivoting:

library(dplyr)
library(tidyr)

df1 %>% 
  pivot_wider(
    names_from = V2,
    values_from = V3
  ) %>% 
  pivot_longer(
    -c(V1, transport),
    names_to = "V2",
    values_to = "V3"
  )%>% 
  type.convert(as.is = TRUE) %>% 
  na.omit() %>% 
  group_by(V1) %>% 
  mutate(transport = transport/max(row_number()))

  V1    transport V2          V3
  <chr>     <dbl> <chr>    <int>
1 a            10 product1   100
2 b             5 product1   100
3 b             5 product2   100