Home > Net >  Making a feature matrix from frequency of letters in a DF cell (split string into list of characters
Making a feature matrix from frequency of letters in a DF cell (split string into list of characters

Time:10-01

I want to count the occurrence of each letter in a string in a DF row and add the count to a new DF with 26 columns.

The rows of this new DF would be the index of the original DF.

I have looked at the list function and also using list comprehension. I am able to split a string into a list of characters. However, I am unable to reach the correct syntax for applying these functions to a DF column.

string = 'this is a string'
lst = []

for letter in string:
   lst.append(letter)

and also

lst = list(string)

I feel that it is using the apply function and perhaps a lambda? I have had a search of the site and it has revealed little. I think that perhaps I am looking for the wrong thing as I am sure that this has been done before!

CodePudding user response:

You can try like this:

for i in range(ord('a'), ord('z')   1):
    ch = chr(i)
    df[ch] = df['your_column_name'].apply(lambda x : x.count(ch))
  • Related