Home > Back-end >  TypeError: Cannot use method 'nlargest' with dtype object
TypeError: Cannot use method 'nlargest' with dtype object

Time:05-22

I have this data:

              ID      Date_utc Upvotes Number of Comments                                     Subthread name Post Author
0     sw73ml  1645266563.0       2                NaN  I fucking love cars, but actually driving, or ...         NaN
1     sw73sa  1645266581.0       3                NaN                               It's my birthday!!!!         NaN
2     sw73va  1645266588.0       3                NaN                            My bike just got stolen         NaN
3     sw73x0  1645266593.0       4                NaN                   I feel like an outsider socially         NaN
4     sw75gk  1645266754.0      10                NaN     Hallo? Ist dis Bert und Ernie’s BDSM emporium?         NaN
...      ...           ...     ...                ...                                                ...         ...
7703  uou8wd  1652455643.0       2                NaN                       Holy crap I forgot how good…         NaN
7704  uou8yy  1652455648.0       4                NaN                       Just got told to kill myself         NaN
7705  uou8zv  1652455650.0       4                NaN                                            hey YOU         NaN
7706  uouagi  1652455771.0       1                NaN                           STEVEN UNIVERSE IS GREAT         NaN
7707  uouaks  1652455780.0       1                NaN  drinking water after chewing gum is the cold e...         NaN

and I wanted to get the 10 largest number in the ['Upvotes'] column. But I got this error TypeError: Cannot use method 'nlargest' with dtype object

filelar = df_p.nlargest(10, "Upvotes" )

even though items in Upvotes were probably numbers it gave that error so I tried this:

for i in df_p["Upvotes"].items():
        try: 
            df_p["Upvotes"] = df_p["Upvotes"].astype(float) 
            print('succsessful') 
        except: 
            pass 
            print('failed')

but it just printed failed for every single item in Upvotes. Then I printed the i along with the failed print statement. And I noticed this (7505, 'Upvotes') in the index 7505 instead of a number there is the word Upvotes. And I found like 10 of these. I think this is probably causing the problem.

So if I am right, and that is the problem. Is there any way to skip the items that are causing this problem? Because the way I tried it with try and except didn't go well.

Thanks

CodePudding user response:

You can try pandas.to_numeric

df['Upvotes'] = pd.to_numeric(df['Upvotes'], errors='coerce')

filelar = df.nlargest(10, "Upvotes" )
  • Related