Home > Software design >  replace the empty value in the dataframe with a list of python values
replace the empty value in the dataframe with a list of python values

Time:05-14

There is a list of shops |Shop ID| |-------| | Shop1 | | Shop2 | | Shop3 | There is a list of events that took place in the store |Shop ID| Event | Start_date | End_date | |-------|---------|------------|-----------| | Shop1 | Action | 01/01/2022 |05/01/2022 | | Shop2 | Action | 05/01/2022 |10/01/2022 | | | Action | 15/01/2022 |20/01/2022 | | Shop3 | Action | 20/01/2022 |25/01/2022 | If the Shop ID value is empty, it means that the event was held in all stores. The following table must be displayed |Shop ID| Event | Start_date | End_date | |-------|---------|------------|-----------| | Shop1 | Action | 01/01/2022 |05/01/2022 | | Shop2 | Action | 05/01/2022 |10/01/2022 | | Shop1 | Action | 15/01/2022 |20/01/2022 | | Shop2 | Action | 15/01/2022 |20/01/2022 | | Shop3 | Action | 15/01/2022 |20/01/2022 | | Shop3 | Action | 20/01/2022 |25/01/2022 |

CodePudding user response:

You can fill the empty value with list then explode

lst = ['Shop1','Shop2','Shop3','Shop4','Shop5']
df['Shop ID'] = df['Shop ID'].apply(lambda x: x if len(x) else lst)
# or if your empty means NaN
df['Shop ID'] = df['Shop ID'].apply(lambda x: x if x != x else lst)
df = df.explode(['Shop ID'])
print(df)

  Shop ID   Event  Start_date    End_date
0   Shop1  Action  01/01/2022  05/01/2022
1   Shop2  Action  05/01/2022  10/01/2022
2   Shop1  Action  15/01/2022  20/01/2022
2   Shop2  Action  15/01/2022  20/01/2022
2   Shop3  Action  15/01/2022  20/01/2022
2   Shop4  Action  15/01/2022  20/01/2022
2   Shop5  Action  15/01/2022  20/01/2022
3   Shop3  Action  20/01/2022  25/01/2022
4   Shop4  Action  25/01/2022  30/01/2022

CodePudding user response:

For a fully vectorial solution, you can use:

# unique (non-NA) IDs 
# NB. If your empty cells are empty string, convert to NaN first
IDs = df['Shop ID'].dropna().unique()

# identify NaNs
m = df['Shop ID'].isna()

# repeat NaNs
df2 = df.loc[df.index.repeat(m.mul(len(IDs)-1).add(1))]

# replace with all shop IDs
df2.loc[m[m].index, 'Shop ID'] = np.repeat(IDs, m.sum())

print(df2)

output:

  Shop ID   Event  Start_date    End_date
0   Shop1  Action  01/01/2022  05/01/2022
1   Shop2  Action  05/01/2022  10/01/2022
2   Shop1  Action  15/01/2022  20/01/2022
2   Shop2  Action  15/01/2022  20/01/2022
2   Shop3  Action  15/01/2022  20/01/2022
2   Shop4  Action  15/01/2022  20/01/2022
3   Shop3  Action  20/01/2022  25/01/2022
4   Shop4  Action  25/01/2022  30/01/2022
  • Related