Home > OS >  Create and fill rows of zeros with pandas
Create and fill rows of zeros with pandas

Time:10-02

I have two dataframes, 'inp' and 'IndiRelat'. The 'inp' are the entries of certain data.

import pandas as pd

inp = [[2, 'cvt'  , -3,  5, 17, -2, -9, -0.2, 'RL'],
       [2, 'cv'   ,  0,  0,  0,  0,  0,    0, 'LL'],
       [2, 'sope' ,  0,  0,  0,  0,  0,    0, 'SD'],
       [2, 'wix ' ,-13,-13,  2,  1,-62, -0.5, 'WI'],
       [2, 'wix-' ,  0, 16,  6, 13,  0,  0.3, 'WI'],
       [4, 'sope' ,-42,  0, 29,  0,  0,  -13, 'SD'],
       [4, 'cv'   ,  0,  0,  0,  0,  0,    0, 'LL'],
       [4, 'cvt'  ,  0,  0,  0,  0,  0,   -1, 'RL'],
       [4, 'wix ' ,-18, -2, 19, 19,  3,  -64, 'WI'],
       [4, 'wix-' ,  0,-30, -2, -2, 32,    0, 'WI']]

inp = pd.DataFrame(data = inp, columns = ['Key','Descr', 'C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'Indicator'])

print(inp['Key'])
print(inp['Indicator'])

IndiRelat = ['SD', 'LL', 'RL', 'SS', 'RR', 'WI', 'WI', 'WI', 'WI', 'QU', 'QU']
IndiRelat = pd.DataFrame(IndiRelat)

I have created an 'inp' DataFrame, in the there is data for each key 'inp ['Key']' which relate to an indicator 'inp['indicator']'.

The idea is to relate this with a second DataFrame 'IndiRelat, and create rows of zeros in case the 'IndiRelat' data is not in 'inp['indicator']'.

What I seek to get is something like that.

    Index   Descr   C1  C2  C3  C4  C5  C6  Indicator
0   2       cvt     -3  5   17  -2  -9  -0.2    RL
1   2       cv      0   0   0   0   0   0       LL
2   2       sope    0   0   0   0   0   0       SD
3   2       wix     -13 -13 2   1   -62 -0.5    WI
4   2       wix-    0   16  6   13  0   0.3     WI
5   2       0       0   0   0   0   0   0       WI
6   2       0       0   0   0   0   0   0       WI
7   2       0       0   0   0   0   0   0       QU
8   2       0       0   0   0   0   0   0       QU
9   2       0       0   0   0   0   0   0       SS
10  2       0       0   0   0   0   0   0       RR
11  4       cvt    -42  0   29  0   0   -13     RL
12  4       cv      0   0   0   0   0   0       LL
13  4       sope    0   0   0   0   0   -1      SD
14  4       wix     -18 -2  19  19  3   -64     WI
15  4       wix-    0   -30 -2  -2  32  0       WI
16  4       0       0   0   0   0   0   0       WI
17  4       0       0   0   0   0   0   0       WI
18  4       0       0   0   0   0   0   0       QU
19  4       0       0   0   0   0   0   0       QU
20  4       0       0   0   0   0   0   0       SS
21  4       0       0   0   0   0   0   0       RR

I would greatly appreciate if you can help me with the idea and suggestions to get it, greetings.

CodePudding user response:

We try to do the cumcount create the unique key then we do reindex

inp['new'] = inp.groupby(['Key','Indicator']).cumcount()
IndiRelat[1] = IndiRelat.groupby(0).cumcount()

IndiRelat.columns = ['Indicator','new']


out = inp.set_index(['Key','Indicator','new']).unstack(level=0).reindex(pd.MultiIndex.from_frame(IndiRelat),fill_value=0).stack().reset_index().sort_values('Key')
out
Out[93]: 
   Indicator  new  Key Descr  C1  C2  C3  C4  C5    C6
0         SD    0    2  sope   0   0   0   0   0   0.0
18        QU    0    2     0   0   0   0   0   0   0.0
16        WI    3    2     0   0   0   0   0   0   0.0
14        WI    2    2     0   0   0   0   0   0   0.0
12        WI    1    2  wix-   0  16   6  13   0   0.3
20        QU    1    2     0   0   0   0   0   0   0.0
8         RR    0    2     0   0   0   0   0   0   0.0
10        WI    0    2  wix  -13 -13   2   1 -62  -0.5
6         SS    0    2     0   0   0   0   0   0   0.0
4         RL    0    2   cvt  -3   5  17  -2  -9  -0.2
2         LL    0    2    cv   0   0   0   0   0   0.0
9         RR    0    4     0   0   0   0   0   0   0.0
5         RL    0    4   cvt   0   0   0   0   0  -1.0
11        WI    0    4  wix  -18  -2  19  19   3 -64.0
13        WI    1    4  wix-   0 -30  -2  -2  32   0.0
3         LL    0    4    cv   0   0   0   0   0   0.0
15        WI    2    4     0   0   0   0   0   0   0.0
17        WI    3    4     0   0   0   0   0   0   0.0
1         SD    0    4  sope -42   0  29   0   0 -13.0
19        QU    0    4     0   0   0   0   0   0   0.0
7         SS    0    4     0   0   0   0   0   0   0.0
21        QU    1    4     0   0   0   0   0   0   0.0

CodePudding user response:

An outer merge ought to do the trick. As per @BENY’s comment and answer, we need to address non-uniqueness of the key:

>>> df2 = IndiRelat.rename(columns={0: 'Indicator'})
>>> df2['dedup_indic'] = df2.groupby('Indicator').cumcount()
>>> df = inp.join(inp.groupby('Indicator').cumcount().rename('dedup_indic'))\
...         .merge(df2, how='outer')
>>> df
    Key Descr    C1    C2    C3    C4    C5    C6 Indicator  dedup_indic
0   2.0   cvt  -3.0   5.0  17.0  -2.0  -9.0  -0.2        RL            0
1   2.0    cv   0.0   0.0   0.0   0.0   0.0   0.0        LL            0
2   2.0  sope   0.0   0.0   0.0   0.0   0.0   0.0        SD            0
3   2.0  wix  -13.0 -13.0   2.0   1.0 -62.0  -0.5        WI            0
4   2.0  wix-   0.0  16.0   6.0  13.0   0.0   0.3        WI            1
5   4.0  sope -42.0   0.0  29.0   0.0   0.0 -13.0        SD            1
6   4.0    cv   0.0   0.0   0.0   0.0   0.0   0.0        LL            1
7   4.0   cvt   0.0   0.0   0.0   0.0   0.0  -1.0        RL            1
8   4.0  wix  -18.0  -2.0  19.0  19.0   3.0 -64.0        WI            2
9   4.0  wix-   0.0 -30.0  -2.0  -2.0  32.0   0.0        WI            3
10  NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN        SS            0
11  NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN        RR            0
12  NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN        QU            0
13  NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN        QU            1
>>> df.fillna(0).drop(columns=['dedup_indic'])
    Key Descr    C1    C2    C3    C4    C5    C6 Indicator
0   2.0   cvt  -3.0   5.0  17.0  -2.0  -9.0  -0.2        RL
1   2.0    cv   0.0   0.0   0.0   0.0   0.0   0.0        LL
2   2.0  sope   0.0   0.0   0.0   0.0   0.0   0.0        SD
3   2.0  wix  -13.0 -13.0   2.0   1.0 -62.0  -0.5        WI
4   2.0  wix-   0.0  16.0   6.0  13.0   0.0   0.3        WI
5   4.0  sope -42.0   0.0  29.0   0.0   0.0 -13.0        SD
6   4.0    cv   0.0   0.0   0.0   0.0   0.0   0.0        LL
7   4.0   cvt   0.0   0.0   0.0   0.0   0.0  -1.0        RL
8   4.0  wix  -18.0  -2.0  19.0  19.0   3.0 -64.0        WI
9   4.0  wix-   0.0 -30.0  -2.0  -2.0  32.0   0.0        WI
10  0.0     0   0.0   0.0   0.0   0.0   0.0   0.0        SS
11  0.0     0   0.0   0.0   0.0   0.0   0.0   0.0        RR
12  0.0     0   0.0   0.0   0.0   0.0   0.0   0.0        QU
13  0.0     0   0.0   0.0   0.0   0.0   0.0   0.0        QU
  • Related