Home > Software design >  add row value if not exist group by date
add row value if not exist group by date

Time:09-29

i have the dataframe below, i want to make sure if value ELSE does not exist in column, i should to add it.

my dataframe source is below

date         zone       RATE
2021-09-01    NY        90
2021-09-01    LA        80
2021-09-01    ELSE      10
2021-09-02    NY        60
2021-09-02    LA        70

expected output

2021-09-01    NY        90
2021-09-01    LA        80
2021-09-01    ELSE      10
2021-09-02    NY        60
2021-09-02    LA        70
2021-09-02    ELSE      0

CodePudding user response:

Use DataFrame.unstack with DataFrame.stack, for original order is used ordered Categorical:

df['zone'] = pd.Categorical(df['zone'], ordered=True, categories=df['zone'].unique())

df = df.set_index(['date','zone']).unstack(fill_value=0).stack().reset_index()
print (df)
         date  zone  RATE
0  2021-09-01    NY    90
1  2021-09-01    LA    80
2  2021-09-01  ELSE    10
3  2021-09-02    NY    60
4  2021-09-02    LA    70
5  2021-09-02  ELSE     0

Alternative:

df['zone'] = pd.Categorical(df['zone'], ordered=True, categories=df['zone'].unique())

df = df.set_index(['date','zone'])
df = df.reindex(pd.MultiIndex.from_product(df.index.levels), fill_value=0).reset_index()
print (df)

         date  zone  RATE
0  2021-09-01    NY    90
1  2021-09-01    LA    80
2  2021-09-01  ELSE    10
3  2021-09-02    NY    60
4  2021-09-02    LA    70
5  2021-09-02  ELSE     0

Or:

from  itertools import product

df1 = pd.DataFrame(product(df['date'].unique(), df['zone'].unique()), 
                   columns=['date','zone'])
df = df1.merge(df, how='left').fillna({'RATE':0}).astype({'RATE':int})
print (df)

         date  zone  RATE
0  2021-09-01    NY    90
1  2021-09-01    LA    80
2  2021-09-01  ELSE    10
3  2021-09-02    NY    60
4  2021-09-02    LA    70
5  2021-09-02  ELSE     0

CodePudding user response:

Do the pivot then fillna and stack back

out = df.pivot(*df.columns).fillna(0).stack().reset_index(name='RATE')
Out[437]: 
        date  zone  RATE
0 2021-09-01  ELSE  10.0
1 2021-09-01    LA  80.0
2 2021-09-01    NY  90.0
3 2021-09-02  ELSE   0.0
4 2021-09-02    LA  70.0
5 2021-09-02    NY  60.0
  • Related