I have a data like below.
I need to fill in the 'value' column where where there is no data for each month/name.
The month values are the unique values of df['month']
import pandas as pd
a = [['2020-01',1,'a'], ['2020-02',2,'a']]
b = [['2020-01',1,'b'], ['2020-03',4,'b']]
a.extend(b)
df = pd.DataFrame(a, columns=['month','value','name'])
print(df)
Below is the original data.
month value name
0 2020-01 1 a
1 2020-02 2 a
2 2020-01 1 b
3 2020-03 4 b
Below is the expected results when filling in zeros(0). Note that there is a missing month for each of name a and b.
month value name
0 2020-01 1 a
1 2020-02 2 a
2 2020-03 0 a
3 2020-01 1 b
4 2020-02 0 b
5 2020-03 4 b
What would be the most efficient way?
CodePudding user response:
Let's do pivot
and stack
to populate all the unique months across names:
df.pivot('name', 'month', 'value').fillna(0).stack().reset_index(name='value')
name month value
0 a 2020-01 1.0
1 a 2020-02 2.0
2 a 2020-03 0.0
3 b 2020-01 1.0
4 b 2020-02 0.0
5 b 2020-03 4.0
CodePudding user response:
I have done like below. But I think it is not efficient.
I would be grateful if anyone can show much more efficient code.
months = df['month'].unique().tolist()
names = df['name'].unique().tolist()
for name in names:
df_name = df[df['name'] == name]
months_name = df_name['month'].unique().tolist()
months_2_fill = util.set_difference(months, months_name)
df_ = pd.DataFrame(months_2_fill, columns=['month'])
df_['name'] = name
df_['value'] = 0
df = df.append(df_)