I have this data:
ID Date_utc Upvotes Number of Comments Subthread name Post Author
0 sw73ml 1645266563.0 2 NaN I fucking love cars, but actually driving, or ... NaN
1 sw73sa 1645266581.0 3 NaN It's my birthday!!!! NaN
2 sw73va 1645266588.0 3 NaN My bike just got stolen NaN
3 sw73x0 1645266593.0 4 NaN I feel like an outsider socially NaN
4 sw75gk 1645266754.0 10 NaN Hallo? Ist dis Bert und Ernie’s BDSM emporium? NaN
... ... ... ... ... ... ...
7703 uou8wd 1652455643.0 2 NaN Holy crap I forgot how good… NaN
7704 uou8yy 1652455648.0 4 NaN Just got told to kill myself NaN
7705 uou8zv 1652455650.0 4 NaN hey YOU NaN
7706 uouagi 1652455771.0 1 NaN STEVEN UNIVERSE IS GREAT NaN
7707 uouaks 1652455780.0 1 NaN drinking water after chewing gum is the cold e... NaN
and I wanted to get the 10 largest number in the ['Upvotes']
column. But I got this error TypeError: Cannot use method 'nlargest' with dtype object
filelar = df_p.nlargest(10, "Upvotes" )
even though items in Upvotes were probably numbers it gave that error so I tried this:
for i in df_p["Upvotes"].items():
try:
df_p["Upvotes"] = df_p["Upvotes"].astype(float)
print('succsessful')
except:
pass
print('failed')
but it just printed failed
for every single item in Upvotes
. Then I printed the i
along with the failed
print statement. And I noticed this (7505, 'Upvotes')
in the index 7505 instead of a number there is the word Upvotes
. And I found like 10 of these. I think this is probably causing the problem.
So if I am right, and that is the problem. Is there any way to skip the items that are causing this problem? Because the way I tried it with try and except
didn't go well.
Thanks
CodePudding user response:
You can try pandas.to_numeric
df['Upvotes'] = pd.to_numeric(df['Upvotes'], errors='coerce')
filelar = df.nlargest(10, "Upvotes" )