Home > Enterprise >  Adding rows based on condition in Pandas
Adding rows based on condition in Pandas

Time:08-03

I have a pandas dataframe that looks like this:

Index Year Month Fruit Count
0 2021 03 Apple 2
1 2021 03 Orange 3
2 2021 03 Mango 4
3 2021 04 Apple 1
4 2021 04 Mango 2
5 2021 05 Apple 1

Let's say that Apple, Orange and Mango are the only fruits I am interested in.

The dataframe is formed such that if the count for a fruit (in a month and year) is 0, it won't show up as a record. I want to add the missing fruit row to the dataframe with count column value = 0. In short, I want each of the 3 fruits to show up for the year and month irrespective of the count being 0. So I want it to look like this:

Index Year Month Fruit Count
0 2021 03 Apple 2
1 2021 03 Orange 3
2 2021 03 Mango 4
3 2021 04 Apple 1
x 2021 04 Orange 0
4 2021 04 Mango 2
5 2021 05 Apple 1
x 2021 05 Orange 0
x 2021 05 Mango 0

Appreciate any suggestions on how to approach this.

CodePudding user response:

You can use pyjanitor's complete to fill the missing rows:

import io
import pandas as pd

import janitor
data = '''Index Year    Month   Fruit   Count
0   2021    03  Apple   2
1   2021    03  Orange  3
2   2021    03  Mango   4
3   2021    04  Apple   1
4   2021    04  Mango   2
5   2021    05  Apple   1'''

df = pd.read_csv(io.StringIO(data), sep='\t', index_col='Index')

df = df.complete('Fruit', ['Year', 'Month']).fillna(0, downcast='infer').sort_values(['Year', 'Month', 'Fruit'])

Output:

Year Month Fruit Count
0 2021 3 Apple 2
2 2021 3 Mango 4
1 2021 3 Orange 3
3 2021 4 Apple 1
4 2021 4 Mango 2
6 2021 4 Orange 0
5 2021 5 Apple 1
8 2021 5 Mango 0
7 2021 5 Orange 0

CodePudding user response:

You can use Categorical to add Fruit as a category

Year = [2021, 2021, 2021, 2021, 2021, 2021]
Month = ["03","03", "03","04","04","05"]
Fruit = ["Apple", "Orange", "Mango", "Apple", "Mango", "Apple"]
Count = [2,3,4,1,2,1]

df = pd.DataFrame({"Year":Year, "Month":Month, "Fruit":Fruit, "Count":Count})

df['Fruit'] = pd.Categorical(df['Fruit'], categories=df['Fruit'].unique())
df = df.groupby(['Year','Month','Fruit'], as_index=False).first()
df['Count'] = df.Count.fillna(0).astype(int)
df

Output:


Year    Month   Fruit   Count
0   2021    03  Apple   2
1   2021    03  Orange  3
2   2021    03  Mango   4
3   2021    04  Apple   1
4   2021    04  Orange  0
5   2021    04  Mango   2
6   2021    05  Apple   1
7   2021    05  Orange  0
8   2021    05  Mango   0
  • Related