I am trying to figure out how to get value_counts from how many times a specific text value is listed in the column.
Example data:
d = {'Title': ['Crash Landing on You', 'Memories of the Alhambra', 'The Heirs', 'While You Were Sleeping',
'Something in the Rain', 'Uncontrollably Fond'],
'Cast' : ['Hyun Bin,Son Ye Jin,Seo Ji Hye', 'Hyun Bin,Park Shin Hye,Park Hoon', 'Lee Min Ho,Park Shin Hye,Kim Woo Bin',
'Bae Suzy,Lee Jong Suk,Jung Hae In', 'Son Ye Jin,Jung Hae In,Jang So Yeon', 'Kim Woo Bin,Bae Suzy,Im Joo Hwan']}
Title Cast
0 Crash Landing on You Hyun Bin,Son Ye Jin,Seo Ji Hye
1 Memories of the Alhambra Hyun Bin,Park Shin Hye,Park Hoon
2 The Heirs Lee Min Ho,Park Shin Hye,Kim Woo Bin
3 While You Were Sleeping Bae Suzy,Lee Jong Suk,Jung Hae In
4 Something in the Rain Son Ye Jin,Jung Hae In,Jang So Yeon
5 Uncontrollably Fond Kim Woo Bin,Bae Suzy,Im Joo Hwan
When I split the text and do value counts:
df['Cast'] = df['Cast'].str.split(',')
df['Cast'].value_counts()
[Hyun Bin, Son Ye Jin, Seo Ji Hye] 1
[Hyun Bin, Park Shin Hye, Park Hoon] 1
[Lee Min Ho, Park Shin Hye, Kim Woo Bin] 1
[Bae Suzy, Lee Jong Suk, Jung Hae In] 1
[Son Ye Jin, Jung Hae In, Jang So Yeon] 1
[Kim Woo Bin, Bae Suzy, Im Joo Hwan] 1
Name: Cast, dtype: int64
How do I get the amount of times a specific text is shown in the 'Cast' column? ie:
[Park Shin Hye] 2
[Hyun Bin] 2
[Bae Suzy] 1
etc
CodePudding user response:
You should use the .explode
method to "unpack" each list in different rows. Then .value_counts
will work as intended in the original code:
import pandas as pd
d = {'Title': ['Crash Landing on You', 'Memories of the Alhambra', 'The Heirs', 'While You Were Sleeping',
'Something in the Rain', 'Uncontrollably Fond'],
'Cast' : ['Hyun Bin,Son Ye Jin,Seo Ji Hye', 'Hyun Bin,Park Shin Hye,Park Hoon', 'Lee Min Ho,Park Shin Hye,Kim Woo Bin',
'Bae Suzy,Lee Jong Suk,Jung Hae In', 'Son Ye Jin,Jung Hae In,Jang So Yeon', 'Kim Woo Bin,Bae Suzy,Im Joo Hwan']}
df = pd.DataFrame(d)
df['Cast'].str.split(',').explode('Cast').value_counts()
CodePudding user response:
You are probably looking for the str.count() method.
https://www.w3schools.com/python/ref_string_count.asp
CodePudding user response:
I don't have much experience either, but afaik one way to achieve this is that: after a call to str.split(',')
, you can break each of the cast lists into multiple rows in the dataframe using explode()
(see the docs), and then do a value_count()
on the resulting data frame.
I believe this is far from the optimized strategy, but it works :) This is my first answer in the community, I'm super open to any suggestions!
The full code is as follows:
df['Cast'] = df['Cast'].str.split(',')
df = df.explode('Cast')
df['Cast'].value_counts()