Using Pandas I am trying to do group by for multiple columns and then fill the pandas dataframe where a person name is not present
For Example this is my Dataframe enter image description here
V1 V2 V3 PN
1 10 20 A
2 10 21 A
3 10 20 C
I have a unique person name list = ['A','B','C','D','E']
Expected Outcome:- enter image description here
V1 V2 V3 PN
1 10 20 A
1 10 20 B
1 10 20 C
1 10 20 D
1 10 20 E
2 10 21 A
2 10 21 B
2 10 21 C
2 10 21 D
2 10 21 E
3 10 20 A
3 10 20 B
3 10 20 C
3 10 20 D
3 10 20 E
I was thinking about trying group by pandas statement but it didnt work out
CodePudding user response:
Try this, using pd.MultiIndex
with reindex
to create additional rows:
import pandas as pd
df = pd.DataFrame({'Version 1':[1,2,3],
'Version 2':[10,10,10],
'Version 3':[20,21,20],
'Person Name':'A A C'.split(' ')})
p_list = [*'ABCDE']
df.set_index(['Version 1', 'Person Name'])\
.reindex(pd.MultiIndex.from_product([df['Version 1'].unique(), p_list],
names=['Version 1', 'Person Name']))\
.groupby(level=0, group_keys=False).apply(lambda x: x.ffill().bfill())\
.reset_index()
Output:
Version 1 Person Name Version 2 Version 3
0 1 A 10.0 20.0
1 1 B 10.0 20.0
2 1 C 10.0 20.0
3 1 D 10.0 20.0
4 1 E 10.0 20.0
5 2 A 10.0 21.0
6 2 B 10.0 21.0
7 2 C 10.0 21.0
8 2 D 10.0 21.0
9 2 E 10.0 21.0
10 3 A 10.0 20.0
11 3 B 10.0 20.0
12 3 C 10.0 20.0
13 3 D 10.0 20.0
14 3 E 10.0 20.0