How can I count comma-separated values in my Dataframe?-CodePudding

I am trying to figure out how to get value_counts from how many times a specific text value is listed in the column.

Example data:

d = {'Title': ['Crash Landing on You', 'Memories of the Alhambra', 'The Heirs', 'While You Were Sleeping', 
'Something in the Rain', 'Uncontrollably Fond'], 
'Cast' : ['Hyun Bin,Son Ye Jin,Seo Ji Hye', 'Hyun Bin,Park Shin Hye,Park Hoon', 'Lee Min Ho,Park Shin Hye,Kim Woo Bin', 
'Bae Suzy,Lee Jong Suk,Jung Hae In', 'Son Ye Jin,Jung Hae In,Jang So Yeon', 'Kim Woo Bin,Bae Suzy,Im Joo Hwan']}

Title   Cast
0   Crash Landing on You    Hyun Bin,Son Ye Jin,Seo Ji Hye
1   Memories of the Alhambra    Hyun Bin,Park Shin Hye,Park Hoon
2   The Heirs   Lee Min Ho,Park Shin Hye,Kim Woo Bin
3   While You Were Sleeping Bae Suzy,Lee Jong Suk,Jung Hae In
4   Something in the Rain   Son Ye Jin,Jung Hae In,Jang So Yeon
5   Uncontrollably Fond Kim Woo Bin,Bae Suzy,Im Joo Hwan

When I split the text and do value counts:

df['Cast'] = df['Cast'].str.split(',')
df['Cast'].value_counts()

[Hyun Bin, Son Ye Jin, Seo Ji Hye]          1
[Hyun Bin, Park Shin Hye, Park Hoon]        1
[Lee Min Ho, Park Shin Hye, Kim Woo Bin]    1
[Bae Suzy, Lee Jong Suk, Jung Hae In]       1
[Son Ye Jin, Jung Hae In, Jang So Yeon]     1
[Kim Woo Bin, Bae Suzy, Im Joo Hwan]        1
Name: Cast, dtype: int64

How do I get the amount of times a specific text is shown in the 'Cast' column? ie:

[Park Shin Hye] 2
[Hyun Bin] 2
[Bae Suzy] 1 
etc

CodePudding user response：

You should use the .explode method to "unpack" each list in different rows. Then .value_counts will work as intended in the original code:

import pandas as pd

d = {'Title': ['Crash Landing on You', 'Memories of the Alhambra', 'The Heirs', 'While You Were Sleeping', 
'Something in the Rain', 'Uncontrollably Fond'], 
'Cast' : ['Hyun Bin,Son Ye Jin,Seo Ji Hye', 'Hyun Bin,Park Shin Hye,Park Hoon', 'Lee Min Ho,Park Shin Hye,Kim Woo Bin', 
'Bae Suzy,Lee Jong Suk,Jung Hae In', 'Son Ye Jin,Jung Hae In,Jang So Yeon', 'Kim Woo Bin,Bae Suzy,Im Joo Hwan']}

df = pd.DataFrame(d)
df['Cast'].str.split(',').explode('Cast').value_counts()

CodePudding user response：

You are probably looking for the str.count() method.

https://www.w3schools.com/python/ref_string_count.asp

CodePudding user response：

I don't have much experience either, but afaik one way to achieve this is that: after a call to str.split(','), you can break each of the cast lists into multiple rows in the dataframe using explode()(see the docs), and then do a value_count() on the resulting data frame.

I believe this is far from the optimized strategy, but it works :) This is my first answer in the community, I'm super open to any suggestions!

The full code is as follows:

df['Cast'] = df['Cast'].str.split(',')
df = df.explode('Cast')
df['Cast'].value_counts()