I have a dataframe with a list of years in the first column. A second column shows the number of years listed in each row.
Years Count_of_Years
0 [] 2
1 [] 2
2 ['2021'] 6
3 ['2022'] 6
4 [] 2
Which made me think that the contents of each cell is a pure string. And it seems that way when I checked the type:
type(df['Years'][0])
str
When I convert the column to list using to_list()
, it shows:
df['Years'].to_list()
'[]',
'[]',
"['2021']",
"['2021']",
'[]',
'[]',
How do I convert it so that the Count_of_Years
shows correct values?
CodePudding user response:
If the values in Years
column are already strings then I would suggest to use the str.count
method with a regex pattern to find the number of matching occurrences:
df['new_count'] = df['Years'].str.count(r'\d{4}')
Years Count_of_Years new_count
0 [] 2 0
1 [] 2 0
2 ['2021'] 6 1
3 ['2022'] 6 1
4 [] 2 0