I'm trying to update a dataframe with 153 rows and 291 columns with the first column being a product description and all other columns numeric.
Most of the rows sum to 1, but due to some bad data input, some have had their values doubled.
I want to update columns based upon if the rowsum is larger than 1. So that the row sum becomes 1.
A small example data:
test <- data.frame(product = c("First", "Second", "Third"),
a = c(0.2, 0.1, 0.2),
b = c(0.4, 0.1, 0.2),
c = c(0.4, 0.8, 1.6)
)
product a b c
1 First 0.2 0.4 0.4
2 Second 0.1 0.1 0.8
3 Third 0.2 0.2 1.6
I've tried using
test %>% mutate(across(2:ncol(test), if_else(rowSums(.) > 1, . / 2, .)))
but get the following error:
Error: Problem with mutate()
input ..1
.
i ..1 = across(2:ncol(test), if_else(rowSums(.) > 1, ./2, .))
.
x 'x' must be numeric.
CodePudding user response:
This is how to ensure that each row sum is no more than 1:
library(tidyverse)
test <- data.frame(
product = c("First", "Second", "Third"),
a = c(0.2, 0.1, 0.2),
b = c(0.4, 0.1, 0.2),
c = c(0.4, 0.8, 1.6)
)
test
#> product a b c
#> 1 First 0.2 0.4 0.4
#> 2 Second 0.1 0.1 0.8
#> 3 Third 0.2 0.2 1.6
test_long <-
test %>%
pivot_longer(-product) %>%
group_by(product)
test_long
#> # A tibble: 9 x 3
#> # Groups: product [3]
#> product name value
#> <chr> <chr> <dbl>
#> 1 First a 0.2
#> 2 First b 0.4
#> 3 First c 0.4
#> 4 Second a 0.1
#> 5 Second b 0.1
#> 6 Second c 0.8
#> 7 Third a 0.2
#> 8 Third b 0.2
#> 9 Third c 1.6
scaled_test <-
test_long %>%
filter(sum(value) > 1) %>%
mutate(value = value / sum(value))
scaled_test
#> # A tibble: 3 x 3
#> # Groups: product [1]
#> product name value
#> <chr> <chr> <dbl>
#> 1 Third a 0.1
#> 2 Third b 0.1
#> 3 Third c 0.8
test_long %>%
filter(sum(value) <= 1) %>%
bind_rows(scaled_test) %>%
pivot_wider(names_from = name, values_from = value)
#> # A tibble: 3 x 4
#> # Groups: product [3]
#> product a b c
#> <chr> <dbl> <dbl> <dbl>
#> 1 First 0.2 0.4 0.4
#> 2 Second 0.1 0.1 0.8
#> 3 Third 0.1 0.1 0.8
Created on 2021-10-05 by the reprex package (v2.0.1)
CodePudding user response:
In base
, you can solve it this way:
idx <- rowSums(test[, -1]) > 1
test[idx, -1] <- test[idx, -1] / 2
test
# product a b c
#1 First 0.2 0.4 0.4
#2 Second 0.1 0.1 0.8
#3 Third 0.1 0.1 0.8