Home > database >  Fillna/null Values Base on Below Line Based on Conditions in Pandas
Fillna/null Values Base on Below Line Based on Conditions in Pandas

Time:02-18

I have a huge pandas data frame just like below.

d = {'id1': ['85643', '85644','discription','8564312','8564314','85645','discription','85646','8564318','85647','85648','85649','85655','56731','34566','78931','78931'],'ID': ['G-00001', 'G-00001','G-00002','G-00002','G-00002','G-00001','G-00001','G-00001','G-00001','G-00001','G-00002','G-00002','G-00002','G-00002','G-00003','G-00003','G-00003'],'col1': [np.nan, np.nan,5,np.nan,4,np.nan,60,np.nan,0,6,3,2,4,32,3,1,23],'Goal': [np.nan, 56,78,np.nan,89,73,np.nan ,np.nan ,np.nan, np.nan, np.nan, 34,np.nan, 7, 84,np.nan,5 ],'col2': [793, np.nan,8,np.nan,43,np.nan,610,np.nan,0,16,23,72,48,3,28,5,3],'col3': [500, np.nan,89,np.nan,44,np.nan,60,np.nan,5,6,3,2,4,13,12,14,98],'Date': ['2021-06-13', '2021-06-13','2021-06-14','2021-06-13','2021-06-14','2021-06-15','2021-06-15','2021-06-13','2021-06-16','2021-06-13','2021-06-13','2021-06-13','2021-06-16','2021-05-23','2021-05-13','2021-03-26','2021-05-13']}
df = pd.DataFrame(data=d)
df

    id1         ID      col1        Goal        col2        col3      Date
0   85643       G-00001 NaN         NaN         793.000     500.000   2021-06-13
1   85644       G-00001 NaN         56.0000     NaN         NaN       2021-06-13
2   discription G-00002 5.0000      78.0000     8.0000      89.0000   2021-06-14
3   8564312     G-00002 NaN         NaN         NaN         NaN       2021-06-13
4   8564314     G-00002 4.0000      89.0000     43.0000     44.0000   2021-06-14
5   85645       G-00001 NaN         73.0000     NaN         NaN       2021-06-15
6   discription G-00001 60.0000     NaN         610.000     60.0000   2021-06-15
7   85646       G-00001 NaN         NaN         NaN         NaN       2021-06-13
8   8564318     G-00001 0.0000      NaN         0.0000      5.0000    2021-06-16
9   85647       G-00001 6.0000      NaN         16.0000     6.0000    2021-06-13
10  85648       G-00002 3.0000      NaN         23.0000     3.0000    2021-06-13
11  85649       G-00002 2.0000      34.0000     72.0000     2.0000    2021-06-13
12  85655       G-00002 4.0000      NaN         48.0000     4.0000    2021-06-16
13  56731       G-00002 32.0000     7.0000      3.0000      13.0000   2021-05-23
14  34566       G-00003 3.0000      84.0000     28.0000     12.0000   2021-05-13
15  78931       G-00003 1.0000      NaN         5.0000      14.0000   2021-03-26
16  78931       G-00003 23.0000     5.0000      3.0000      98.0000   2021-05-13

So I want to fill the columns "col1", "col2" and "col3" when meet the below criteria. When "id1" column has word "discription", I want to fill na values in upper line for "col1", "col2" and "col3" using the values in "id1" "discription" line.

For example index 2 has word "discription" and i want to fill na values in index 1 for "col1", "col2" and "col3" using the values in "col1", "col2" and "col3" in "discription" line. All the other null values I don't want to fill in for "col1", "col2" and "col3".

So the final dataframe should be look like below.

    id1         ID      col1        Goal        col2        col3      Date
0   85643       G-00001 NaN         NaN         793.000     500.000   2021-06-13
1   85644       G-00001 5.0000      56.0000     8.0000      89.0000   2021-06-13
2   discription G-00002 5.0000      78.0000     8.0000      89.0000   2021-06-14
3   8564312     G-00002 NaN         NaN         NaN         NaN       2021-06-13
4   8564314     G-00002 4.0000      89.0000     43.0000     44.0000   2021-06-14
5   85645       G-00001 60.0000     73.0000     610.000     60.0000   2021-06-15
6   discription G-00001 60.0000     NaN         610.000     60.0000   2021-06-15
7   85646       G-00001 NaN         NaN         NaN         NaN       2021-06-13
8   8564318     G-00001 0.0000      NaN         0.0000      5.0000    2021-06-16
9   85647       G-00001 6.0000      NaN         16.0000     6.0000    2021-06-13
10  85648       G-00002 3.0000      NaN         23.0000     3.0000    2021-06-13
11  85649       G-00002 2.0000      34.0000     72.0000     2.0000    2021-06-13
12  85655       G-00002 4.0000      NaN         48.0000     4.0000    2021-06-16
13  56731       G-00002 32.0000     7.0000      3.0000      13.0000   2021-05-23
14  34566       G-00003 3.0000      84.0000     28.0000     12.0000   2021-05-13
15  78931       G-00003 1.0000      NaN         5.0000      14.0000   2021-03-26
16  78931       G-00003 23.0000     5.0000      3.0000      98.0000   2021-05-13

Is it possible to do this in python? Any suggestion would be appreciated. Thanks in advance!

CodePudding user response:

Mask the rows which do not have discription in id1, then shift the masked dataframe one unit upwards and fill the nan values in required columns

c = ['col1', 'col2', 'col3']

mask = df['id1'].eq('discription')
df[c] = df[c].fillna(df[c].where(mask).shift(-1))

            id1       ID  col1  Goal   col2   col3        Date
0         85643  G-00001   NaN   NaN  793.0  500.0  2021-06-13
1         85644  G-00001   5.0  56.0    8.0   89.0  2021-06-13
2   discription  G-00002   5.0  78.0    8.0   89.0  2021-06-14
3       8564312  G-00002   NaN   NaN    NaN    NaN  2021-06-13
4       8564314  G-00002   4.0  89.0   43.0   44.0  2021-06-14
5         85645  G-00001  60.0  73.0  610.0   60.0  2021-06-15
6   discription  G-00001  60.0   NaN  610.0   60.0  2021-06-15
7         85646  G-00001   NaN   NaN    NaN    NaN  2021-06-13
8       8564318  G-00001   0.0   NaN    0.0    5.0  2021-06-16
9         85647  G-00001   6.0   NaN   16.0    6.0  2021-06-13
10        85648  G-00002   3.0   NaN   23.0    3.0  2021-06-13
11        85649  G-00002   2.0  34.0   72.0    2.0  2021-06-13
12        85655  G-00002   4.0   NaN   48.0    4.0  2021-06-16
13        56731  G-00002  32.0   7.0    3.0   13.0  2021-05-23
14        34566  G-00003   3.0  84.0   28.0   12.0  2021-05-13
15        78931  G-00003   1.0   NaN    5.0   14.0  2021-03-26
16        78931  G-00003  23.0   5.0    3.0   98.0  2021-05-13
  • Related