I have a data frame which I need to transform. I need to change the unique rows into single columns based on the value of a column.
My data below:
df1 <- data.frame(V1 = c("a", "a", "b", "b","b"),
V2 = c("product1", "transport", "product1", "product2","transport"),
V3 = c("100", "10", "100", "100","10"))
> df1
V1 V2 V3
1 a product1 100
2 a transport 10
3 b product1 100
4 b product2 100
5 b transport 10
I need the following transformation and the division of the value of V3 into the number of products included in V1.
> df2
V1 V2 transport V3
1 a product1 10 100
2 b product1 5 100
3 b product2 5 100
CodePudding user response:
Here is one method with data.table
- convert to data.table
(setDT
), make sure that the 'V3' is numeric
(for division - it was created as character), grouped by 'V1', create the 'transport' by extracting the 'V3' value where 'V2' is 'transport' and divide by the number of elements in 'V2' that are not "transport", then subset the data by removing the 'transport' elements from 'V2'
library(data.table)
df1$V3 <- as.numeric(df1$V3)
setDT(df1)[, transport := V3[V2 == "transport"]/
sum(V2 != "transport"), by = V1]
df1[V2 != "transport"]
V1 V2 V3 transport
<char> <char> <num> <num>
1: a product1 100 10
2: b product1 100 5
3: b product2 100 5
Or another option with dplyr/tidyr
library(dplyr)
library(tidyr)
df1 %>%
type.convert(as.is = TRUE) %>%
mutate(transport = case_when(V2 == 'transport' ~ V3)) %>%
group_by(V1) %>%
fill(transport, .direction = "downup") %>%
mutate(transport = transport/sum(V2 != "transport")) %>%
ungroup %>%
filter(V2 != "transport")
# A tibble: 3 × 4
V1 V2 V3 transport
<chr> <chr> <int> <dbl>
1 a product1 100 10
2 b product1 100 5
3 b product2 100 5
CodePudding user response:
Here's a tidyverse-based solution, using tidyr
and dplyr
:
- convert
V3
to numeric - pivot wider to create a separate column for
transport
, then pivot longer to collapse"product1"
and"product2"
back into one column - divide
transport
by number of cases perV1
.
library(dplyr)
library(tidyr)
df1 %>%
mutate(V3 = as.numeric(V3)) %>%
pivot_wider(names_from = V2, values_from = V3) %>%
pivot_longer(
cols = c(product1, product2),
names_to = "V2",
values_to = "V3",
values_drop_na = TRUE
) %>%
group_by(V1) %>%
mutate(transport = transport / n()) %>%
ungroup()
#> # A tibble: 3 x 4
#> V1 transport V2 V3
#> <chr> <dbl> <chr> <dbl>
#> 1 a 10 product1 100
#> 2 b 5 product1 100
#> 3 b 5 product2 100
Created on 2022-03-17 by the reprex package (v2.0.1)
CodePudding user response:
Here is an usual approach to get your desired output if I got your point right:
library(dplyr)
library(tidyr)
df1 %>%
group_by(V1) %>%
mutate(V3 = ifelse(V2 == "transport", as.numeric(V3) / (n() - 1), as.numeric(V3))) %>%
ungroup() %>%
pivot_wider(names_from = V2, values_from = V3) %>%
pivot_longer(c(product1, product2), names_to = "V2", values_to = "V3") %>%
drop_na()
# A tibble: 3 x 4
V1 transport V2 V3
<chr> <dbl> <chr> <dbl>
1 a 10 product1 100
2 b 5 product1 100
3 b 5 product2 100
CodePudding user response:
Another possible solution:
library(tidyverse)
df1 %>%
mutate(V3 = as.numeric(V3)) %>%
group_by(V1) %>%
mutate(transport = if_else(V2 == "transport", V3 / (n()-1), NA_real_)) %>%
fill(transport, .direction = "up") %>% ungroup %>%
filter(V2 != "transport")
#> # A tibble: 3 × 4
#> V1 V2 V3 transport
#> <chr> <chr> <dbl> <dbl>
#> 1 a product1 100 10
#> 2 b product1 100 5
#> 3 b product2 100 5
CodePudding user response:
Here is one more with pivoting
:
library(dplyr)
library(tidyr)
df1 %>%
pivot_wider(
names_from = V2,
values_from = V3
) %>%
pivot_longer(
-c(V1, transport),
names_to = "V2",
values_to = "V3"
)%>%
type.convert(as.is = TRUE) %>%
na.omit() %>%
group_by(V1) %>%
mutate(transport = transport/max(row_number()))
V1 transport V2 V3
<chr> <dbl> <chr> <int>
1 a 10 product1 100
2 b 5 product1 100
3 b 5 product2 100