Home > Mobile >  python split dataframe based on multiple criteria in 2 columns
python split dataframe based on multiple criteria in 2 columns

Time:11-10

I need to filter the df based on 2 criteria: where Name != jack and ignore all dates for jack where Date <= 2020-04-01

 # List of Tuples
df =       [ ('jack', 'Apples' , '2020-01-01') ,
             ('Riti', 'Mangos' , '2020-02-01') ,
             ('Aadi', 'Grapes' , '2020-03-01') ,
             ('jack', 'Oranges', '2020-04-01') ,
             ('Lucy', 'Mangos' , '2020-05-01') ,
             ('jack', 'Apples' , '2020-12-01')
              ]
#Create a DataFrame object
df1 = pd.DataFrame(df, columns = ['Name' , 'Product', 'Date']) 
df1

Expected result being:

    Name    Product Date
0   Riti    Mangos  2020-02-01
1   Aadi    Grapes  2020-03-01
2   Lucy    Mangos  2020-05-01
3   jack    Apples  2020-12-01

CodePudding user response:

boolean indexing of multi condition

cond1 = df1['Name'] != 'jack'
cond2 = pd.to_datetime(df1['Date']) > pd.Timestamp('2020-04-01')
df1[cond1 | cond2].reset_index(drop=True)

output:

    Name    Product Date
0   Riti    Mangos  2020-02-01
1   Aadi    Grapes  2020-03-01
2   Lucy    Mangos  2020-05-01
3   jack    Apples  2020-12-01

CodePudding user response:

You could do the below:

import pandas as pd
 # List of Tuples
df = [('jack', 'Apples' , '2020-01-01'), 
('Riti', 'Mangos' , '2020-02-01'),
('Aadi', 'Grapes' , '2020-03-01'),
('jack', 'Oranges', '2020-04-01'),
('Lucy', 'Mangos' , '2020-05-01'),
('jack', 'Apples' , '2020-12-01')
]
#Create a DataFrame object
df1 = pd.DataFrame(df, columns = ['Name', 'Product', 'Date']) 

# Filtering for condition Name = !jack
df1 = df1[df1['Name'] != 'jack']

# Convert the date to datetime64
df1['Date'] = pd.to_datetime(df1['Date'], format='%Y-%m-%d')

# Filter by specified date
filtered_df = df1.loc[(df1['Date'] <= '2020-04-01')]

# print the filtered result
print(filtered_df)
  • Related