Create and fill rows of zeros with pandas-CodePudding

I have two dataframes, 'inp' and 'IndiRelat'. The 'inp' are the entries of certain data.

import pandas as pd

inp = [[2, 'cvt'  , -3,  5, 17, -2, -9, -0.2, 'RL'],
       [2, 'cv'   ,  0,  0,  0,  0,  0,    0, 'LL'],
       [2, 'sope' ,  0,  0,  0,  0,  0,    0, 'SD'],
       [2, 'wix ' ,-13,-13,  2,  1,-62, -0.5, 'WI'],
       [2, 'wix-' ,  0, 16,  6, 13,  0,  0.3, 'WI'],
       [4, 'sope' ,-42,  0, 29,  0,  0,  -13, 'SD'],
       [4, 'cv'   ,  0,  0,  0,  0,  0,    0, 'LL'],
       [4, 'cvt'  ,  0,  0,  0,  0,  0,   -1, 'RL'],
       [4, 'wix ' ,-18, -2, 19, 19,  3,  -64, 'WI'],
       [4, 'wix-' ,  0,-30, -2, -2, 32,    0, 'WI']]

inp = pd.DataFrame(data = inp, columns = ['Key','Descr', 'C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'Indicator'])

print(inp['Key'])
print(inp['Indicator'])

IndiRelat = ['SD', 'LL', 'RL', 'SS', 'RR', 'WI', 'WI', 'WI', 'WI', 'QU', 'QU']
IndiRelat = pd.DataFrame(IndiRelat)

I have created an 'inp' DataFrame, in the there is data for each key 'inp ['Key']' which relate to an indicator 'inp['indicator']'.

The idea is to relate this with a second DataFrame 'IndiRelat, and create rows of zeros in case the 'IndiRelat' data is not in 'inp['indicator']'.

What I seek to get is something like that.

    Index   Descr   C1  C2  C3  C4  C5  C6  Indicator
0   2       cvt     -3  5   17  -2  -9  -0.2    RL
1   2       cv      0   0   0   0   0   0       LL
2   2       sope    0   0   0   0   0   0       SD
3   2       wix     -13 -13 2   1   -62 -0.5    WI
4   2       wix-    0   16  6   13  0   0.3     WI
5   2       0       0   0   0   0   0   0       WI
6   2       0       0   0   0   0   0   0       WI
7   2       0       0   0   0   0   0   0       QU
8   2       0       0   0   0   0   0   0       QU
9   2       0       0   0   0   0   0   0       SS
10  2       0       0   0   0   0   0   0       RR
11  4       cvt    -42  0   29  0   0   -13     RL
12  4       cv      0   0   0   0   0   0       LL
13  4       sope    0   0   0   0   0   -1      SD
14  4       wix     -18 -2  19  19  3   -64     WI
15  4       wix-    0   -30 -2  -2  32  0       WI
16  4       0       0   0   0   0   0   0       WI
17  4       0       0   0   0   0   0   0       WI
18  4       0       0   0   0   0   0   0       QU
19  4       0       0   0   0   0   0   0       QU
20  4       0       0   0   0   0   0   0       SS
21  4       0       0   0   0   0   0   0       RR

I would greatly appreciate if you can help me with the idea and suggestions to get it, greetings.

CodePudding user response：

We try to do the cumcount create the unique key then we do reindex

inp['new'] = inp.groupby(['Key','Indicator']).cumcount()
IndiRelat[1] = IndiRelat.groupby(0).cumcount()

IndiRelat.columns = ['Indicator','new']


out = inp.set_index(['Key','Indicator','new']).unstack(level=0).reindex(pd.MultiIndex.from_frame(IndiRelat),fill_value=0).stack().reset_index().sort_values('Key')
out
Out[93]: 
   Indicator  new  Key Descr  C1  C2  C3  C4  C5    C6
0         SD    0    2  sope   0   0   0   0   0   0.0
18        QU    0    2     0   0   0   0   0   0   0.0
16        WI    3    2     0   0   0   0   0   0   0.0
14        WI    2    2     0   0   0   0   0   0   0.0
12        WI    1    2  wix-   0  16   6  13   0   0.3
20        QU    1    2     0   0   0   0   0   0   0.0
8         RR    0    2     0   0   0   0   0   0   0.0
10        WI    0    2  wix  -13 -13   2   1 -62  -0.5
6         SS    0    2     0   0   0   0   0   0   0.0
4         RL    0    2   cvt  -3   5  17  -2  -9  -0.2
2         LL    0    2    cv   0   0   0   0   0   0.0
9         RR    0    4     0   0   0   0   0   0   0.0
5         RL    0    4   cvt   0   0   0   0   0  -1.0
11        WI    0    4  wix  -18  -2  19  19   3 -64.0
13        WI    1    4  wix-   0 -30  -2  -2  32   0.0
3         LL    0    4    cv   0   0   0   0   0   0.0
15        WI    2    4     0   0   0   0   0   0   0.0
17        WI    3    4     0   0   0   0   0   0   0.0
1         SD    0    4  sope -42   0  29   0   0 -13.0
19        QU    0    4     0   0   0   0   0   0   0.0
7         SS    0    4     0   0   0   0   0   0   0.0
21        QU    1    4     0   0   0   0   0   0   0.0

CodePudding user response：

An outer merge ought to do the trick. As per @BENY’s comment and answer, we need to address non-uniqueness of the key:

>>> df2 = IndiRelat.rename(columns={0: 'Indicator'})
>>> df2['dedup_indic'] = df2.groupby('Indicator').cumcount()
>>> df = inp.join(inp.groupby('Indicator').cumcount().rename('dedup_indic'))\
...         .merge(df2, how='outer')
>>> df
    Key Descr    C1    C2    C3    C4    C5    C6 Indicator  dedup_indic
0   2.0   cvt  -3.0   5.0  17.0  -2.0  -9.0  -0.2        RL            0
1   2.0    cv   0.0   0.0   0.0   0.0   0.0   0.0        LL            0
2   2.0  sope   0.0   0.0   0.0   0.0   0.0   0.0        SD            0
3   2.0  wix  -13.0 -13.0   2.0   1.0 -62.0  -0.5        WI            0
4   2.0  wix-   0.0  16.0   6.0  13.0   0.0   0.3        WI            1
5   4.0  sope -42.0   0.0  29.0   0.0   0.0 -13.0        SD            1
6   4.0    cv   0.0   0.0   0.0   0.0   0.0   0.0        LL            1
7   4.0   cvt   0.0   0.0   0.0   0.0   0.0  -1.0        RL            1
8   4.0  wix  -18.0  -2.0  19.0  19.0   3.0 -64.0        WI            2
9   4.0  wix-   0.0 -30.0  -2.0  -2.0  32.0   0.0        WI            3
10  NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN        SS            0
11  NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN        RR            0
12  NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN        QU            0
13  NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN        QU            1
>>> df.fillna(0).drop(columns=['dedup_indic'])
    Key Descr    C1    C2    C3    C4    C5    C6 Indicator
0   2.0   cvt  -3.0   5.0  17.0  -2.0  -9.0  -0.2        RL
1   2.0    cv   0.0   0.0   0.0   0.0   0.0   0.0        LL
2   2.0  sope   0.0   0.0   0.0   0.0   0.0   0.0        SD
3   2.0  wix  -13.0 -13.0   2.0   1.0 -62.0  -0.5        WI
4   2.0  wix-   0.0  16.0   6.0  13.0   0.0   0.3        WI
5   4.0  sope -42.0   0.0  29.0   0.0   0.0 -13.0        SD
6   4.0    cv   0.0   0.0   0.0   0.0   0.0   0.0        LL
7   4.0   cvt   0.0   0.0   0.0   0.0   0.0  -1.0        RL
8   4.0  wix  -18.0  -2.0  19.0  19.0   3.0 -64.0        WI
9   4.0  wix-   0.0 -30.0  -2.0  -2.0  32.0   0.0        WI
10  0.0     0   0.0   0.0   0.0   0.0   0.0   0.0        SS
11  0.0     0   0.0   0.0   0.0   0.0   0.0   0.0        RR
12  0.0     0   0.0   0.0   0.0   0.0   0.0   0.0        QU
13  0.0     0   0.0   0.0   0.0   0.0   0.0   0.0        QU