I have two dataframes, 'inp' and 'IndiRelat'. The 'inp' are the entries of certain data.
import pandas as pd
inp = [[2, 'cvt' , -3, 5, 17, -2, -9, -0.2, 'RL'],
[2, 'cv' , 0, 0, 0, 0, 0, 0, 'LL'],
[2, 'sope' , 0, 0, 0, 0, 0, 0, 'SD'],
[2, 'wix ' ,-13,-13, 2, 1,-62, -0.5, 'WI'],
[2, 'wix-' , 0, 16, 6, 13, 0, 0.3, 'WI'],
[4, 'sope' ,-42, 0, 29, 0, 0, -13, 'SD'],
[4, 'cv' , 0, 0, 0, 0, 0, 0, 'LL'],
[4, 'cvt' , 0, 0, 0, 0, 0, -1, 'RL'],
[4, 'wix ' ,-18, -2, 19, 19, 3, -64, 'WI'],
[4, 'wix-' , 0,-30, -2, -2, 32, 0, 'WI']]
inp = pd.DataFrame(data = inp, columns = ['Key','Descr', 'C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'Indicator'])
print(inp['Key'])
print(inp['Indicator'])
IndiRelat = ['SD', 'LL', 'RL', 'SS', 'RR', 'WI', 'WI', 'WI', 'WI', 'QU', 'QU']
IndiRelat = pd.DataFrame(IndiRelat)
I have created an 'inp' DataFrame, in the there is data for each key 'inp ['Key']' which relate to an indicator 'inp['indicator']'.
The idea is to relate this with a second DataFrame 'IndiRelat, and create rows of zeros in case the 'IndiRelat' data is not in 'inp['indicator']'.
What I seek to get is something like that.
Index Descr C1 C2 C3 C4 C5 C6 Indicator
0 2 cvt -3 5 17 -2 -9 -0.2 RL
1 2 cv 0 0 0 0 0 0 LL
2 2 sope 0 0 0 0 0 0 SD
3 2 wix -13 -13 2 1 -62 -0.5 WI
4 2 wix- 0 16 6 13 0 0.3 WI
5 2 0 0 0 0 0 0 0 WI
6 2 0 0 0 0 0 0 0 WI
7 2 0 0 0 0 0 0 0 QU
8 2 0 0 0 0 0 0 0 QU
9 2 0 0 0 0 0 0 0 SS
10 2 0 0 0 0 0 0 0 RR
11 4 cvt -42 0 29 0 0 -13 RL
12 4 cv 0 0 0 0 0 0 LL
13 4 sope 0 0 0 0 0 -1 SD
14 4 wix -18 -2 19 19 3 -64 WI
15 4 wix- 0 -30 -2 -2 32 0 WI
16 4 0 0 0 0 0 0 0 WI
17 4 0 0 0 0 0 0 0 WI
18 4 0 0 0 0 0 0 0 QU
19 4 0 0 0 0 0 0 0 QU
20 4 0 0 0 0 0 0 0 SS
21 4 0 0 0 0 0 0 0 RR
I would greatly appreciate if you can help me with the idea and suggestions to get it, greetings.
CodePudding user response:
We try to do the cumcount
create the unique key then we do reindex
inp['new'] = inp.groupby(['Key','Indicator']).cumcount()
IndiRelat[1] = IndiRelat.groupby(0).cumcount()
IndiRelat.columns = ['Indicator','new']
out = inp.set_index(['Key','Indicator','new']).unstack(level=0).reindex(pd.MultiIndex.from_frame(IndiRelat),fill_value=0).stack().reset_index().sort_values('Key')
out
Out[93]:
Indicator new Key Descr C1 C2 C3 C4 C5 C6
0 SD 0 2 sope 0 0 0 0 0 0.0
18 QU 0 2 0 0 0 0 0 0 0.0
16 WI 3 2 0 0 0 0 0 0 0.0
14 WI 2 2 0 0 0 0 0 0 0.0
12 WI 1 2 wix- 0 16 6 13 0 0.3
20 QU 1 2 0 0 0 0 0 0 0.0
8 RR 0 2 0 0 0 0 0 0 0.0
10 WI 0 2 wix -13 -13 2 1 -62 -0.5
6 SS 0 2 0 0 0 0 0 0 0.0
4 RL 0 2 cvt -3 5 17 -2 -9 -0.2
2 LL 0 2 cv 0 0 0 0 0 0.0
9 RR 0 4 0 0 0 0 0 0 0.0
5 RL 0 4 cvt 0 0 0 0 0 -1.0
11 WI 0 4 wix -18 -2 19 19 3 -64.0
13 WI 1 4 wix- 0 -30 -2 -2 32 0.0
3 LL 0 4 cv 0 0 0 0 0 0.0
15 WI 2 4 0 0 0 0 0 0 0.0
17 WI 3 4 0 0 0 0 0 0 0.0
1 SD 0 4 sope -42 0 29 0 0 -13.0
19 QU 0 4 0 0 0 0 0 0 0.0
7 SS 0 4 0 0 0 0 0 0 0.0
21 QU 1 4 0 0 0 0 0 0 0.0
CodePudding user response:
An outer merge ought to do the trick. As per @BENY’s comment and answer, we need to address non-uniqueness of the key:
>>> df2 = IndiRelat.rename(columns={0: 'Indicator'})
>>> df2['dedup_indic'] = df2.groupby('Indicator').cumcount()
>>> df = inp.join(inp.groupby('Indicator').cumcount().rename('dedup_indic'))\
... .merge(df2, how='outer')
>>> df
Key Descr C1 C2 C3 C4 C5 C6 Indicator dedup_indic
0 2.0 cvt -3.0 5.0 17.0 -2.0 -9.0 -0.2 RL 0
1 2.0 cv 0.0 0.0 0.0 0.0 0.0 0.0 LL 0
2 2.0 sope 0.0 0.0 0.0 0.0 0.0 0.0 SD 0
3 2.0 wix -13.0 -13.0 2.0 1.0 -62.0 -0.5 WI 0
4 2.0 wix- 0.0 16.0 6.0 13.0 0.0 0.3 WI 1
5 4.0 sope -42.0 0.0 29.0 0.0 0.0 -13.0 SD 1
6 4.0 cv 0.0 0.0 0.0 0.0 0.0 0.0 LL 1
7 4.0 cvt 0.0 0.0 0.0 0.0 0.0 -1.0 RL 1
8 4.0 wix -18.0 -2.0 19.0 19.0 3.0 -64.0 WI 2
9 4.0 wix- 0.0 -30.0 -2.0 -2.0 32.0 0.0 WI 3
10 NaN NaN NaN NaN NaN NaN NaN NaN SS 0
11 NaN NaN NaN NaN NaN NaN NaN NaN RR 0
12 NaN NaN NaN NaN NaN NaN NaN NaN QU 0
13 NaN NaN NaN NaN NaN NaN NaN NaN QU 1
>>> df.fillna(0).drop(columns=['dedup_indic'])
Key Descr C1 C2 C3 C4 C5 C6 Indicator
0 2.0 cvt -3.0 5.0 17.0 -2.0 -9.0 -0.2 RL
1 2.0 cv 0.0 0.0 0.0 0.0 0.0 0.0 LL
2 2.0 sope 0.0 0.0 0.0 0.0 0.0 0.0 SD
3 2.0 wix -13.0 -13.0 2.0 1.0 -62.0 -0.5 WI
4 2.0 wix- 0.0 16.0 6.0 13.0 0.0 0.3 WI
5 4.0 sope -42.0 0.0 29.0 0.0 0.0 -13.0 SD
6 4.0 cv 0.0 0.0 0.0 0.0 0.0 0.0 LL
7 4.0 cvt 0.0 0.0 0.0 0.0 0.0 -1.0 RL
8 4.0 wix -18.0 -2.0 19.0 19.0 3.0 -64.0 WI
9 4.0 wix- 0.0 -30.0 -2.0 -2.0 32.0 0.0 WI
10 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 SS
11 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 RR
12 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 QU
13 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 QU