Home > Blockchain >  Create new column based on sum of specific character strings in R
Create new column based on sum of specific character strings in R

Time:09-30

I have a data frame where the variable "id_number" represents a specific person and the other five variables represent tasks that each person has to complete.

id_number personal_value_statement career_inventory resume_cover linkedin    personal_budget
      <int> <chr>                    <chr>            <chr>        <chr>       <chr>          
1      1438 in progress              not started      completed    completed   in progress    
2      7362 in progress              not started      not started  completed   completed      
3      3239 in progress              not started      completed    in progress not started    
4      1285 in progress              in progress      in progress  not started not started    
5      8945 not started              not started      not started  not started not started    
6      9246 in progress              not started      not started  completed   not started 
structure(list(id_number = c(1438L, 7362L, 3239L, 1285L, 8945L, 
9246L), personal_value_statement = c("in progress", "in progress", 
"in progress", "in progress", "not started", "in progress"), 
    career_inventory = c("not started", "not started", "not started", 
    "in progress", "not started", "not started"), resume_cover = c("completed", 
    "not started", "completed", "in progress", "not started", 
    "not started"), linkedin = c("completed", "completed", "in progress", 
    "not started", "not started", "completed"), personal_budget = c("in progress", 
    "completed", "not started", "not started", "not started", 
    "not started")), class = c("rowwise_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -6L), groups = structure(list(
    .rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

I want to create a new column based that counts the number of tasks each person has completed. I assume I'm going to use mutate() but I'm not sure how to sum character strings. Essentially what I am looking for is a column where the value for "id_number 1438" = 2 because they completed two tasks ("resume_cover" & "linkedin") and so on for the rest of the id_numbers.

Any and all help is much appreciated.

CodePudding user response:

Use rowSums after ungrouping

library(dplyr)
df1 <-  df1 %>% 
     ungroup %>% 
     mutate(Count = rowSums(across(-id_number) == "completed"))

-output

df1
# A tibble: 6 × 7
  id_number personal_value_statement career_inventory resume_cover linkedin    personal_budget Count
      <int> <chr>                    <chr>            <chr>        <chr>       <chr>           <dbl>
1      1438 in progress              not started      completed    completed   in progress         2
2      7362 in progress              not started      not started  completed   completed           2
3      3239 in progress              not started      completed    in progress not started         1
4      1285 in progress              in progress      in progress  not started not started         0
5      8945 not started              not started      not started  not started not started         0
6      9246 in progress              not started      not started  completed   not started         1
  • Related