Home > OS >  Pandas (python) conditional row selection
Pandas (python) conditional row selection

Time:12-06

I searched this on stackoverflow and couldn't see a proper answer from some of the results I got. I am trying to learn pandas and goal is to find a particular value

import pandas as pd

df_data = {
    "age": [24, 9, 20, 24],
    "weight": [170, 106, 201, 216],
    "dob": ["1/1990", "1/2000", "2/1994", "1/1990"]
}

df = pd.DataFrame(df_data) 

now I want to pick the rows with dob >= "1/1990" and <= "1/2020". Then I have to pick the max weight and then print the corresponding age. This is what I am doing

weightMax = df[(df["dob"] >= "1/1990") & (df["dob"] <= "1/2020")]
index = weightMax["weight"].idxmax()

if I print weightMax[index] for the whole row, it throws index outofbound error. where am I going wrong here? Also, to get the corresponding age, how do I get it?

Thanks a lot for the help

CodePudding user response:

Convert your dob column to be a proper datetime so your comparison will work reliably:

df['dob'] = pd.to_datetime(df['dob'], format='%m/%Y')

Then you can subset that using .between and take the .nlargest based on the weight and squeeze it to a Series:

heaviest = df[df['dob'].between('1991-01-01', '2020-01-01')].nlargest(1, 'weight').squeeze()

This'll give you:

age                        20
weight                    201
dob       1994-02-01 00:00:00
Name: 2, dtype: object

If you want to access the original index then you can do so by using accessing heaviest.name.

Example print:

print('The heaviest was {0.weight}kg and was {0.age} years old. They were originally at index {0.name}.'.format(heaviest))

Gives you:

'The heaviest was 201kg and was 20 years old. They were originally at index 2.'

CodePudding user response:

Please try to use loc

weightMax.loc[index]
age            9
weight       106
dob       1/2000
Name: 1, dtype: object

CodePudding user response:

You can use filter to deal with it. You gonna write more code, but it will be a more legible code.

Implements your logical in a function and use the filter on the df_data['dob'] array.

https://www.programiz.com/python-programming/methods/built-in/filter

Edit: Oh, crap. I don't use pandas frequently, but i found this article on the documentation. It's about filtering the dataframe. It's easier than the solution above. You also can use regex.

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.filter.html

CodePudding user response:

Convert all the dates to Unix times which is stored as integer then you can easily apply any filters to dates.

import pandas as pd
import datetime
import time

df_data = {
    "age": [24, 9, 20, 24],
    "weight": [170, 106, 201, 216],
    "dob": ["1/1990", "1/2000", "2/1994", "1/1990"]
}

df = pd.DataFrame(df_data) 
def dates(x):
    return time.mktime(datetime.datetime.strptime(DOB, "%m/%Y").timetuple())

df["dob_unix"]=df["dob"].map(dates)
weightMax = df[(df["dob_unix"] >= 662706000.0) & (df["dob_unix"] <= 1577854800.0)]
index = weightMax["weight"].idxmax()
  • Related