Home > Software design >  Pandas: Find closest date - without set_index - multiple conditions
Pandas: Find closest date - without set_index - multiple conditions

Time:04-30

We have the following Pandas Dataframe:

# Stackoverflow question
data = {'category':[1, 2, 3, 1, 2, 3, 1, 2, 3], 'date':['2000-01-01', '2000-01-01', '2000-01-01', '2000-01-02', '2000-01-02', '2000-01-02', '2000-01-03', '2000-01-03', '2000-01-03']}  
df = pd.DataFrame(data=data)
df['date'] = pd.to_datetime(df['date'])
df

category    date
0   1   2000-01-01
1   2   2000-01-01
2   3   2000-01-01
3   1   2000-01-02
4   2   2000-01-02
5   3   2000-01-02
6   1   2000-01-03
7   2   2000-01-03
8   3   2000-01-03

How can we query this dataframe to find the date 2000-01-02 with category 3? So we are looking for the row with index 5.

It should be accomplished without set_index('date'). The reason for this is as follows, when setting the index on the actual data rather than the example data I receive the following error:

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

CodePudding user response:

Take a subset of relevant category, subtract the target date, and get idxmin

tmp = df.loc[df.category.eq(3)]
(tmp.date - pd.to_datetime("2000-01-02")).abs().idxmin()
# 5

CodePudding user response:

To get the (first) closest index date with category 3 you could use:

m = df['category'].eq(3)
d = df['date'].sub(pd.Timestamp('2000-01-02')).abs()
d.loc[m].idxmin()

output: 5

CodePudding user response:

df[(df['category']==3) & (df['date']==pd.Timestamp(2000,1,2))]

To get the list of all indices:

df.index[(df['category']==3) & (df['date']==pd.Timestamp(2000,1,2))].tolist()

Okay :)

  • Related