Home > OS >  Create new variable based on two variables in my dataset in r
Create new variable based on two variables in my dataset in r

Time:04-11

I would like to create a column in my dataset which is the subtraction of positive and negative sentiment from my total column.

So for user Alex, who has a positive sentiment sum of 80 and a negative sentiment sum of 13, the subtracted score will be 67.

The issue I am having is grouping the sentiment column in a way which allows me to preform this operation.

library(tidyverse)

# create mock dataframe
users <- c("Alex", "Alice", "Alexandra", "Andrew", "Alicia", "Alex", "Alice", "Alexandra", "Andrew", "Alicia")
sentiment <- c("positive", "negative", "positive","negative", "positive", "negative", "positive", "negative","positive", "negative")
total <- c(80, 70, 24, 74, 66, 13, 35, 94, 27, 94)

mockdataframe <- cbind(users,sentiment, total) %>% as_tibble()
mockdataframe$sentiment <- as.factor(mockdataframe$sentiment)
mockdataframe$total <- as.numeric(mockdataframe$total)

# using case_when() this way does not work
mockdataframe %>% 
  mutate(Subtraction = case_when(
    sentiment == "positive" ~ (sentiment == "negative")/mockdataframe$total))

I am really struggling trying to solve this. Any help would be appreciated.

CodePudding user response:

Using tidyr::pivot_wider you could do:

library(tidyverse)

mockdataframe %>% 
  pivot_wider(names_from = sentiment, values_from = total) %>%
  mutate(Subtraction = positive - negative)
#> # A tibble: 5 × 4
#>   users     positive negative Subtraction
#>   <chr>        <dbl>    <dbl>       <dbl>
#> 1 Alex            80       13          67
#> 2 Alice           35       70         -35
#> 3 Alexandra       24       94         -70
#> 4 Andrew          27       74         -47
#> 5 Alicia          66       94         -28

Or using group_by:

mockdataframe %>% 
  group_by(users) %>%
  mutate(Subtraction = total[sentiment == "positive"] - total[sentiment == "negative"]) |> 
  ungroup()
#> # A tibble: 10 × 4
#>    users     sentiment total Subtraction
#>    <chr>     <fct>     <dbl>       <dbl>
#>  1 Alex      positive     80          67
#>  2 Alice     negative     70         -35
#>  3 Alexandra positive     24         -70
#>  4 Andrew    negative     74         -47
#>  5 Alicia    positive     66         -28
#>  6 Alex      negative     13          67
#>  7 Alice     positive     35         -35
#>  8 Alexandra negative     94         -70
#>  9 Andrew    positive     27         -47
#> 10 Alicia    negative     94         -28
  • Related