Home > Back-end >  Pandas (How to Fix): List is actually string and the value of length is misleading
Pandas (How to Fix): List is actually string and the value of length is misleading

Time:07-30

I have a dataframe with a list of years in the first column. A second column shows the number of years listed in each row.

    Years   Count_of_Years
0   []         2
1   []         2
2   ['2021']   6
3   ['2022']   6
4   []         2

Which made me think that the contents of each cell is a pure string. And it seems that way when I checked the type:

type(df['Years'][0])

str

When I convert the column to list using to_list(), it shows:

df['Years'].to_list()
 '[]',
 '[]',
 "['2021']",
 "['2021']",
 '[]',
 '[]', 

How do I convert it so that the Count_of_Years shows correct values?

CodePudding user response:

If the values in Years column are already strings then I would suggest to use the str.count method with a regex pattern to find the number of matching occurrences:

df['new_count'] = df['Years'].str.count(r'\d{4}')

      Years  Count_of_Years  new_count
0        []               2          0
1        []               2          0
2  ['2021']               6          1
3  ['2022']               6          1
4        []               2          0
  • Related