i have the dataframe below, i want to make sure if value ELSE does not exist in column, i should to add it.
my dataframe source is below
date zone RATE
2021-09-01 NY 90
2021-09-01 LA 80
2021-09-01 ELSE 10
2021-09-02 NY 60
2021-09-02 LA 70
expected output
2021-09-01 NY 90
2021-09-01 LA 80
2021-09-01 ELSE 10
2021-09-02 NY 60
2021-09-02 LA 70
2021-09-02 ELSE 0
CodePudding user response:
Use DataFrame.unstack
with
DataFrame.stack
, for original order is used ordered Categorical
:
df['zone'] = pd.Categorical(df['zone'], ordered=True, categories=df['zone'].unique())
df = df.set_index(['date','zone']).unstack(fill_value=0).stack().reset_index()
print (df)
date zone RATE
0 2021-09-01 NY 90
1 2021-09-01 LA 80
2 2021-09-01 ELSE 10
3 2021-09-02 NY 60
4 2021-09-02 LA 70
5 2021-09-02 ELSE 0
Alternative:
df['zone'] = pd.Categorical(df['zone'], ordered=True, categories=df['zone'].unique())
df = df.set_index(['date','zone'])
df = df.reindex(pd.MultiIndex.from_product(df.index.levels), fill_value=0).reset_index()
print (df)
date zone RATE
0 2021-09-01 NY 90
1 2021-09-01 LA 80
2 2021-09-01 ELSE 10
3 2021-09-02 NY 60
4 2021-09-02 LA 70
5 2021-09-02 ELSE 0
Or:
from itertools import product
df1 = pd.DataFrame(product(df['date'].unique(), df['zone'].unique()),
columns=['date','zone'])
df = df1.merge(df, how='left').fillna({'RATE':0}).astype({'RATE':int})
print (df)
date zone RATE
0 2021-09-01 NY 90
1 2021-09-01 LA 80
2 2021-09-01 ELSE 10
3 2021-09-02 NY 60
4 2021-09-02 LA 70
5 2021-09-02 ELSE 0
CodePudding user response:
Do the pivot
then fillna
and stack
back
out = df.pivot(*df.columns).fillna(0).stack().reset_index(name='RATE')
Out[437]:
date zone RATE
0 2021-09-01 ELSE 10.0
1 2021-09-01 LA 80.0
2 2021-09-01 NY 90.0
3 2021-09-02 ELSE 0.0
4 2021-09-02 LA 70.0
5 2021-09-02 NY 60.0