Home > Enterprise >  dataframe fill in a value where there is no data
dataframe fill in a value where there is no data

Time:12-26

I have a data like below.

I need to fill in the 'value' column where where there is no data for each month/name.

The month values are the unique values of df['month']

import pandas as pd

a = [['2020-01',1,'a'], ['2020-02',2,'a']]
b = [['2020-01',1,'b'], ['2020-03',4,'b']]
a.extend(b)

df = pd.DataFrame(a, columns=['month','value','name'])
print(df)

Below is the original data.

     month  value name
0  2020-01      1    a
1  2020-02      2    a
2  2020-01      1    b
3  2020-03      4    b

Below is the expected results when filling in zeros(0). Note that there is a missing month for each of name a and b.

     month  value name
0  2020-01      1    a
1  2020-02      2    a
2  2020-03      0    a
3  2020-01      1    b
4  2020-02      0    b
5  2020-03      4    b

What would be the most efficient way?

CodePudding user response:

Let's do pivot and stack to populate all the unique months across names:

df.pivot('name', 'month', 'value').fillna(0).stack().reset_index(name='value')

  name    month  value
0    a  2020-01    1.0
1    a  2020-02    2.0
2    a  2020-03    0.0
3    b  2020-01    1.0
4    b  2020-02    0.0
5    b  2020-03    4.0

CodePudding user response:

I have done like below. But I think it is not efficient.

I would be grateful if anyone can show much more efficient code.

months = df['month'].unique().tolist()
names = df['name'].unique().tolist()
for name in names:
    df_name = df[df['name'] == name]
    months_name = df_name['month'].unique().tolist()
    months_2_fill = util.set_difference(months, months_name)

    df_ = pd.DataFrame(months_2_fill, columns=['month'])
    df_['name'] = name
    df_['value'] = 0

    df = df.append(df_)

  • Related