I'm looking to count the number of times different words appear in a data frame, and then to remake that into a new data frame that shows the counts for each word.
For example, I have a data table like this:
Col1 | Col2 | Col3 | Col4 | Col5 | Continues... |
---|---|---|---|---|---|
Passwords1 | GHSME12 | POWDER2 | JOHNC | PLOW01 | PLANE |
Usercode20 | HUNG1 | GHSME12 | PLOW01 | GORGE09 | JOHNC |
Usercode15 | PLOW01 | GORGE09 | JOHNC | POWDER2 | SYRUP9 |
Continues... | ... | ... | ... | ... | ... |
I want to be able to count the number of times each word in the data appears for each Col1. While I could do things such as the number of items WordX = wordX, there are hundreds of passwords, making manual counting difficult, so I wonder if I have to use a for loop and a blank data frame in this situation to achieve something like this:
Passwords | Passwords1 | Usercode20 | Usercode15 | Continues... |
---|---|---|---|---|
GHSME12 | 1 | 1 | 0 | ... |
POWDER2 | 1 | 0 | 1 | ... |
JOHNC | 1 | 1 | 1 | ... |
PLOW01 | 1 | 1 | 1 | ... |
PLANE | 1 | 0 | 0 | ... |
HUNG1 | 0 | 1 | 0 | ... |
GORGE09 | 0 | 1 | 1 | ... |
SYRUP9 | 0 | 0 | 1 | ... |
I would appreciate it if someone has a good idea about tackling this. Thank you!
CodePudding user response:
table(cbind(stack(df, -Col1)['values'], df['Col1']))
Col1
values Passwords1 Usercode15 Usercode20
GHSME12 1 0 1
GORGE09 0 1 1
HUNG1 0 0 1
JOHNC 1 1 1
PLANE 1 0 0
PLOW01 1 1 1
POWDER2 1 1 0
SYRUP9 0 1 0
tidyverse:
library(tidyverse)
df %>%
pivot_longer(-Col1) %>%
pivot_wider(names_from = Col1, values_from = name,
values_fn = length, values_fill = 0)
# A tibble: 8 x 4
value Passwords1 Usercode20 Usercode15
<chr> <int> <int> <int>
1 GHSME12 1 1 0
2 POWDER2 1 0 1
3 JOHNC 1 1 1
4 PLOW01 1 1 1
5 PLANE 1 0 0
6 HUNG1 0 1 0
7 GORGE09 0 1 1
8 SYRUP9 0 0 1