I have a column called features and each row has the following entries
ABS, AM/FM Radio, Air Bags, Air Conditioning, CD Player
ABS, AM/FM Radio, Air Bags, Air Conditioning
ABS, AM/FM Radio, Air Bags
what want to do is create a column called Number of features that would give the result
5
4
3
CodePudding user response:
Assuming your data frame looks like this:
df <- data.frame(col = c("ABS, AM/FM Radio, Air Bags, Air Conditioning, CD Player",
"ABS, AM/FM Radio, Air Bags, Air Conditioning",
"ABS, AM/FM Radio, Air Bags"))
df
#> col
#> 1 ABS, AM/FM Radio, Air Bags, Air Conditioning, CD Player
#> 2 ABS, AM/FM Radio, Air Bags, Air Conditioning
#> 3 ABS, AM/FM Radio, Air Bags
Then you could do:
df$items <- lengths(strsplit(df$col, ","))
df
#> col items
#> 1 ABS, AM/FM Radio, Air Bags, Air Conditioning, CD Player 5
#> 2 ABS, AM/FM Radio, Air Bags, Air Conditioning 4
#> 3 ABS, AM/FM Radio, Air Bags 3
This works simply by counting the number of segments each row contains when you split the text strings at the commas. It will work as long as none of the item names themself contain commas.