I can't seem to find such a solution on here similar to my scenario. Here is a column in my example dataset:
How_do_you_feel
Excited, Hopeful, Prepared, good
Unsure, confused, anxious, curious
Co operations, Teamwork, communication, critical thinking
a
First, team work, nervous, curious
Interesting. New. Exciting. Develop
perplexed,anxious,embarrassed,bit excited
Novel, Unknown, Challenging, Useful
Worried, excited, self-doubt, motivated
Excited,curious,nervous,worried
The correct format should be 4 words, separated by a comma in between like this 'Excited, Hopeful, Prepared, good'.
How do I clean my data in a way that it removes all the rows that have the wrong format, such as 'Interesting. New. Exciting. Develop' or 'perplexed,anxious,embarrassed,bit excited'?
So the result would look something like this:
How_do_you_feel
Excited, Hopeful, Prepared, good
Unsure, confused, anxious, curious
Co operations, Teamwork, communication, critical thinking
First, team work, nervous, curious
Novel, Unknown, Challenging, Useful
Worried, excited, self-doubt, motivated
Thanks!
CodePudding user response:
You said that this is a column of your dataset. So I am assuming data structure:
How_do_you_feel <- c("", "Excited, Hopeful, Prepared, good", "Unsure, confused, anxious, curious",
"Co operations, Teamwork, communication, critical thinking",
"a", "First, team work, nervous, curious", "Interesting. New. Exciting. Develop",
"perplexed,anxious,embarrassed,bit excited", "Novel, Unknown, Challenging, Useful",
"Worried, excited, self-doubt, motivated", "Excited,curious,nervous,worried"
)
Just keep those with three commas:
How_do_you_feel[stringr::str_count(How_do_you_feel, ",") == 3]
#[1] "Excited, Hopeful, Prepared, good"
#[2] "Unsure, confused, anxious, curious"
#[3] "Co operations, Teamwork, communication, critical thinking"
#[4] "First, team work, nervous, curious"
#[5] "Novel, Unknown, Challenging, Useful"
#[6] "Worried, excited, self-doubt, motivated"
#[7] "Excited,curious,nervous,worried"
You can also trim white spaces using trimws
, if necessary.
CodePudding user response:
Here is one potential solution:
library(tidyverse)
lines <- c("Excited, Hopeful, Prepared, good",
"Unsure, confused, anxious, curious",
"Co operations, Teamwork, communication, critical thinking",
"a",
"First, team work, nervous, curious",
"Interesting. New. Exciting. Develop",
"perplexed,anxious,embarrassed,bit excited",
"Novel, Unknown, Challenging, Useful",
"Worried, excited, self-doubt, motivated",
"Excited,curious,nervous,worried")
df <- data.frame(How_do_you_feel = lines)
df
#> How_do_you_feel
#> 1 Excited, Hopeful, Prepared, good
#> 2 Unsure, confused, anxious, curious
#> 3 Co operations, Teamwork, communication, critical thinking
#> 4 a
#> 5 First, team work, nervous, curious
#> 6 Interesting. New. Exciting. Develop
#> 7 perplexed,anxious,embarrassed,bit excited
#> 8 Novel, Unknown, Challenging, Useful
#> 9 Worried, excited, self-doubt, motivated
#> 10 Excited,curious,nervous,worried
df %>%
mutate(How_do_you_feel = str_extract(
How_do_you_feel,
"[[:alpha:][:punct:] ] , [[:alpha:][:punct:] ] , [[:alpha:][:punct:] ] , [[:alpha:][:punct:] ] "
)) %>%
filter(!is.na(How_do_you_feel))
#> How_do_you_feel
#> 1 Excited, Hopeful, Prepared, good
#> 2 Unsure, confused, anxious, curious
#> 3 Co operations, Teamwork, communication, critical thinking
#> 4 First, team work, nervous, curious
#> 5 Novel, Unknown, Challenging, Useful
#> 6 Worried, excited, self-doubt, motivated
Created on 2022-07-22 by the reprex package (v2.0.1)
CodePudding user response:
One generalized rule which appears to apply to your situation is that having three commas followed by a space (not just a comma as in previous answers) means a good match. Try this:
library(tidyverse)
df %>%
filter(str_count(How_do_you_feel, ", ") == 3)
# How_do_you_feel
# <chr>
# 1 "Excited, Hopeful, Prepared, good "
# 2 "Unsure, confused, anxious, curious "
# 3 "Co operations, Teamwork, communication, critical thinking "
# 4 "First, team work, nervous, curious "
# 5 "Novel, Unknown, Challenging, Useful "
# 6 "Worried, excited, self-doubt, motivated "