Adding rows based on condition in Pandas-CodePudding

I have a pandas dataframe that looks like this:

Index	Year	Month	Fruit	Count
0	2021	03	Apple	2
1	2021	03	Orange	3
2	2021	03	Mango	4
3	2021	04	Apple	1
4	2021	04	Mango	2
5	2021	05	Apple	1

Let's say that Apple, Orange and Mango are the only fruits I am interested in.

The dataframe is formed such that if the count for a fruit (in a month and year) is 0, it won't show up as a record. I want to add the missing fruit row to the dataframe with count column value = 0. In short, I want each of the 3 fruits to show up for the year and month irrespective of the count being 0. So I want it to look like this:

Index	Year	Month	Fruit	Count
0	2021	03	Apple	2
1	2021	03	Orange	3
2	2021	03	Mango	4
3	2021	04	Apple	1
x	2021	04	Orange	0
4	2021	04	Mango	2
5	2021	05	Apple	1
x	2021	05	Orange	0
x	2021	05	Mango	0

Appreciate any suggestions on how to approach this.

CodePudding user response：

You can use pyjanitor's complete to fill the missing rows:

import io
import pandas as pd

import janitor
data = '''Index Year    Month   Fruit   Count
0   2021    03  Apple   2
1   2021    03  Orange  3
2   2021    03  Mango   4
3   2021    04  Apple   1
4   2021    04  Mango   2
5   2021    05  Apple   1'''

df = pd.read_csv(io.StringIO(data), sep='\t', index_col='Index')

df = df.complete('Fruit', ['Year', 'Month']).fillna(0, downcast='infer').sort_values(['Year', 'Month', 'Fruit'])

Output:

	Year	Month	Fruit	Count
0	2021	3	Apple	2
2	2021	3	Mango	4
1	2021	3	Orange	3
3	2021	4	Apple	1
4	2021	4	Mango	2
6	2021	4	Orange	0
5	2021	5	Apple	1
8	2021	5	Mango	0
7	2021	5	Orange	0

CodePudding user response：

You can use Categorical to add Fruit as a category

Year = [2021, 2021, 2021, 2021, 2021, 2021]
Month = ["03","03", "03","04","04","05"]
Fruit = ["Apple", "Orange", "Mango", "Apple", "Mango", "Apple"]
Count = [2,3,4,1,2,1]

df = pd.DataFrame({"Year":Year, "Month":Month, "Fruit":Fruit, "Count":Count})

df['Fruit'] = pd.Categorical(df['Fruit'], categories=df['Fruit'].unique())
df = df.groupby(['Year','Month','Fruit'], as_index=False).first()
df['Count'] = df.Count.fillna(0).astype(int)
df

Output:


Year    Month   Fruit   Count
0   2021    03  Apple   2
1   2021    03  Orange  3
2   2021    03  Mango   4
3   2021    04  Apple   1
4   2021    04  Orange  0
5   2021    04  Mango   2
6   2021    05  Apple   1
7   2021    05  Orange  0
8   2021    05  Mango   0