Home > Software engineering >  Updating column data based on row condition
Updating column data based on row condition

Time:10-05

I'm trying to update a dataframe with 153 rows and 291 columns with the first column being a product description and all other columns numeric.

Most of the rows sum to 1, but due to some bad data input, some have had their values doubled.

I want to update columns based upon if the rowsum is larger than 1. So that the row sum becomes 1.

A small example data:

test <- data.frame(product = c("First", "Second", "Third"),
                   a = c(0.2, 0.1, 0.2),
                   b = c(0.4, 0.1, 0.2),
                   c = c(0.4, 0.8, 1.6)
                   )
  product   a   b   c
1   First 0.2 0.4 0.4
2  Second 0.1 0.1 0.8
3   Third 0.2 0.2 1.6

I've tried using

test %>% mutate(across(2:ncol(test), if_else(rowSums(.) > 1, . / 2, .)))

but get the following error:

Error: Problem with mutate() input ..1. i ..1 = across(2:ncol(test), if_else(rowSums(.) > 1, ./2, .)). x 'x' must be numeric.

CodePudding user response:

This is how to ensure that each row sum is no more than 1:

library(tidyverse)

test <- data.frame(
  product = c("First", "Second", "Third"),
  a = c(0.2, 0.1, 0.2),
  b = c(0.4, 0.1, 0.2),
  c = c(0.4, 0.8, 1.6)
)
test
#>   product   a   b   c
#> 1   First 0.2 0.4 0.4
#> 2  Second 0.1 0.1 0.8
#> 3   Third 0.2 0.2 1.6

test_long <-
  test %>%
  pivot_longer(-product) %>%
  group_by(product)
test_long
#> # A tibble: 9 x 3
#> # Groups:   product [3]
#>   product name  value
#>   <chr>   <chr> <dbl>
#> 1 First   a       0.2
#> 2 First   b       0.4
#> 3 First   c       0.4
#> 4 Second  a       0.1
#> 5 Second  b       0.1
#> 6 Second  c       0.8
#> 7 Third   a       0.2
#> 8 Third   b       0.2
#> 9 Third   c       1.6

scaled_test <-
  test_long %>%
  filter(sum(value) > 1) %>%
  mutate(value = value / sum(value))
scaled_test
#> # A tibble: 3 x 3
#> # Groups:   product [1]
#>   product name  value
#>   <chr>   <chr> <dbl>
#> 1 Third   a       0.1
#> 2 Third   b       0.1
#> 3 Third   c       0.8

test_long %>%
  filter(sum(value) <= 1) %>%
  bind_rows(scaled_test) %>%
  pivot_wider(names_from = name, values_from = value)
#> # A tibble: 3 x 4
#> # Groups:   product [3]
#>   product     a     b     c
#>   <chr>   <dbl> <dbl> <dbl>
#> 1 First     0.2   0.4   0.4
#> 2 Second    0.1   0.1   0.8
#> 3 Third     0.1   0.1   0.8

Created on 2021-10-05 by the reprex package (v2.0.1)

CodePudding user response:

In base, you can solve it this way:

idx <- rowSums(test[, -1]) > 1

test[idx, -1] <- test[idx, -1] / 2

test

#  product   a   b   c
#1   First 0.2 0.4 0.4
#2  Second 0.1 0.1 0.8
#3   Third 0.1 0.1 0.8
  • Related