Home > database >  Counting different characters in a large dataframe
Counting different characters in a large dataframe

Time:02-19

I'm looking to count the number of times different words appear in a data frame, and then to remake that into a new data frame that shows the counts for each word.

For example, I have a data table like this:

Col1 Col2 Col3 Col4 Col5 Continues...
Passwords1 GHSME12 POWDER2 JOHNC PLOW01 PLANE
Usercode20 HUNG1 GHSME12 PLOW01 GORGE09 JOHNC
Usercode15 PLOW01 GORGE09 JOHNC POWDER2 SYRUP9
Continues... ... ... ... ... ...

I want to be able to count the number of times each word in the data appears for each Col1. While I could do things such as the number of items WordX = wordX, there are hundreds of passwords, making manual counting difficult, so I wonder if I have to use a for loop and a blank data frame in this situation to achieve something like this:

Passwords Passwords1 Usercode20 Usercode15 Continues...
GHSME12 1 1 0 ...
POWDER2 1 0 1 ...
JOHNC 1 1 1 ...
PLOW01 1 1 1 ...
PLANE 1 0 0 ...
HUNG1 0 1 0 ...
GORGE09 0 1 1 ...
SYRUP9 0 0 1 ...

I would appreciate it if someone has a good idea about tackling this. Thank you!

CodePudding user response:

table(cbind(stack(df, -Col1)['values'], df['Col1']))

         Col1
values    Passwords1 Usercode15 Usercode20
  GHSME12          1          0          1
  GORGE09          0          1          1
  HUNG1            0          0          1
  JOHNC            1          1          1
  PLANE            1          0          0
  PLOW01           1          1          1
  POWDER2          1          1          0
  SYRUP9           0          1          0

tidyverse:

library(tidyverse)
df %>%
   pivot_longer(-Col1) %>%
   pivot_wider(names_from = Col1, values_from = name, 
               values_fn = length, values_fill = 0)

# A tibble: 8 x 4
  value   Passwords1 Usercode20 Usercode15
  <chr>        <int>      <int>      <int>
1 GHSME12          1          1          0
2 POWDER2          1          0          1
3 JOHNC            1          1          1
4 PLOW01           1          1          1
5 PLANE            1          0          0
6 HUNG1            0          1          0
7 GORGE09          0          1          1
8 SYRUP9           0          0          1
  • Related