I searched this on stackoverflow and couldn't see a proper answer from some of the results I got. I am trying to learn pandas and goal is to find a particular value
import pandas as pd
df_data = {
"age": [24, 9, 20, 24],
"weight": [170, 106, 201, 216],
"dob": ["1/1990", "1/2000", "2/1994", "1/1990"]
}
df = pd.DataFrame(df_data)
now I want to pick the rows with dob >= "1/1990" and <= "1/2020". Then I have to pick the max weight and then print the corresponding age. This is what I am doing
weightMax = df[(df["dob"] >= "1/1990") & (df["dob"] <= "1/2020")]
index = weightMax["weight"].idxmax()
if I print weightMax[index] for the whole row, it throws index outofbound error. where am I going wrong here? Also, to get the corresponding age, how do I get it?
Thanks a lot for the help
CodePudding user response:
Convert your dob
column to be a proper datetime
so your comparison will work reliably:
df['dob'] = pd.to_datetime(df['dob'], format='%m/%Y')
Then you can subset that using .between
and take the .nlargest
based on the weight and squeeze
it to a Series
:
heaviest = df[df['dob'].between('1991-01-01', '2020-01-01')].nlargest(1, 'weight').squeeze()
This'll give you:
age 20
weight 201
dob 1994-02-01 00:00:00
Name: 2, dtype: object
If you want to access the original index then you can do so by using accessing heaviest.name
.
Example print
:
print('The heaviest was {0.weight}kg and was {0.age} years old. They were originally at index {0.name}.'.format(heaviest))
Gives you:
'The heaviest was 201kg and was 20 years old. They were originally at index 2.'
CodePudding user response:
Please try to use loc
weightMax.loc[index]
age 9
weight 106
dob 1/2000
Name: 1, dtype: object
CodePudding user response:
You can use filter to deal with it. You gonna write more code, but it will be a more legible code.
Implements your logical in a function and use the filter on the df_data['dob'] array.
https://www.programiz.com/python-programming/methods/built-in/filter
Edit: Oh, crap. I don't use pandas frequently, but i found this article on the documentation. It's about filtering the dataframe. It's easier than the solution above. You also can use regex.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.filter.html
CodePudding user response:
Convert all the dates to Unix times which is stored as integer then you can easily apply any filters to dates.
import pandas as pd
import datetime
import time
df_data = {
"age": [24, 9, 20, 24],
"weight": [170, 106, 201, 216],
"dob": ["1/1990", "1/2000", "2/1994", "1/1990"]
}
df = pd.DataFrame(df_data)
def dates(x):
return time.mktime(datetime.datetime.strptime(DOB, "%m/%Y").timetuple())
df["dob_unix"]=df["dob"].map(dates)
weightMax = df[(df["dob_unix"] >= 662706000.0) & (df["dob_unix"] <= 1577854800.0)]
index = weightMax["weight"].idxmax()