Home > front end >  Combine different observations into new variable [duplicate]
Combine different observations into new variable [duplicate]

Time:09-24

I've been trying to make a graph using either barplot or ggplot but first I need to combine different observations from the same variable.

My variable has different observations depending on how relevant a subject is for each user. like this:

Count  Activity
10     Bikes for fitness reasons
22     Runs for fitness reasons
12     Bikes to commute to work
10     Walks to commute to work
5      Walks to stay healthy

My idea is to merge the observations from the "Activity" variable so it looks like this:

Count Activity
22    Bikes
22    Runs
15    Walks

So, I don't care the reason for them to do the activity, I just want to merge them so I can put that info into a bar graph.

CodePudding user response:

Here is a tidyverse solution:

library(tidyverse)

df %>% 
  mutate(Activity = word(Activity, 1)) %>% 
  group_by(Activity) %>% 
  summarize(Count = sum(Count))

This gives us:

# A tibble: 3 x 2
  Activity Count
  <chr>    <dbl>
1 Bikes       22
2 Runs        22
3 Walks       15

Data:

structure(list(Count = c(10, 22, 12, 10, 5), Activity = c("Bikes for fitness reasons", 
"Runs for fitness reasons", "Bikes to commute to work", "Walks to commute to work", 
"Walks to stay healthy")), row.names = c(NA, -5L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x0000019ba0e31ef0>)

CodePudding user response:

You could use grep() to find each term you are looking for, like this:

df <- data.frame(
  Count = c(10,22,12,10,5),
  Activity = c("Bikes for fitness reasons",
               "Runs for fitness reasons",
               "Bikes to commute to work",
               "Walks to commute to work",
               "Walks to stay healthy"))

# Look for this string
var <- "Bikes"

# Get the row where "Bikes" appears
grep(pattern = var, x = df$Activity)
#> [1] 1 3

# Get Count values from each row where "Bikes" appears
df[grep(pattern = var, x = df$Activity), "Count"]
#> [1] 10 12

CodePudding user response:

Using trimws

library(dplyr)
df %>% 
   group_by(Activity = trimws(Activity, whitespace = "\\s .*")) %>% 
   summarise(Count = sum(Count))

-output

# A tibble: 3 x 2
  Activity Count
  <chr>    <dbl>
1 Bikes       22
2 Runs        22
3 Walks       15
  •  Tags:  
  • r
  • Related