I have a huge pandas data frame just like below.
d = {'id1': ['85643', '85644','discription','8564312','8564314','85645','discription','85646','8564318','85647','85648','85649','85655','56731','34566','78931','78931'],'ID': ['G-00001', 'G-00001','G-00002','G-00002','G-00002','G-00001','G-00001','G-00001','G-00001','G-00001','G-00002','G-00002','G-00002','G-00002','G-00003','G-00003','G-00003'],'col1': [np.nan, np.nan,5,np.nan,4,np.nan,60,np.nan,0,6,3,2,4,32,3,1,23],'Goal': [np.nan, 56,78,np.nan,89,73,np.nan ,np.nan ,np.nan, np.nan, np.nan, 34,np.nan, 7, 84,np.nan,5 ],'col2': [793, np.nan,8,np.nan,43,np.nan,610,np.nan,0,16,23,72,48,3,28,5,3],'col3': [500, np.nan,89,np.nan,44,np.nan,60,np.nan,5,6,3,2,4,13,12,14,98],'Date': ['2021-06-13', '2021-06-13','2021-06-14','2021-06-13','2021-06-14','2021-06-15','2021-06-15','2021-06-13','2021-06-16','2021-06-13','2021-06-13','2021-06-13','2021-06-16','2021-05-23','2021-05-13','2021-03-26','2021-05-13']}
df = pd.DataFrame(data=d)
df
id1 ID col1 Goal col2 col3 Date
0 85643 G-00001 NaN NaN 793.000 500.000 2021-06-13
1 85644 G-00001 NaN 56.0000 NaN NaN 2021-06-13
2 discription G-00002 5.0000 78.0000 8.0000 89.0000 2021-06-14
3 8564312 G-00002 NaN NaN NaN NaN 2021-06-13
4 8564314 G-00002 4.0000 89.0000 43.0000 44.0000 2021-06-14
5 85645 G-00001 NaN 73.0000 NaN NaN 2021-06-15
6 discription G-00001 60.0000 NaN 610.000 60.0000 2021-06-15
7 85646 G-00001 NaN NaN NaN NaN 2021-06-13
8 8564318 G-00001 0.0000 NaN 0.0000 5.0000 2021-06-16
9 85647 G-00001 6.0000 NaN 16.0000 6.0000 2021-06-13
10 85648 G-00002 3.0000 NaN 23.0000 3.0000 2021-06-13
11 85649 G-00002 2.0000 34.0000 72.0000 2.0000 2021-06-13
12 85655 G-00002 4.0000 NaN 48.0000 4.0000 2021-06-16
13 56731 G-00002 32.0000 7.0000 3.0000 13.0000 2021-05-23
14 34566 G-00003 3.0000 84.0000 28.0000 12.0000 2021-05-13
15 78931 G-00003 1.0000 NaN 5.0000 14.0000 2021-03-26
16 78931 G-00003 23.0000 5.0000 3.0000 98.0000 2021-05-13
So I want to fill the columns "col1", "col2" and "col3" when meet the below criteria. When "id1" column has word "discription", I want to fill na values in upper line for "col1", "col2" and "col3" using the values in "id1" "discription" line.
For example index 2 has word "discription" and i want to fill na values in index 1 for "col1", "col2" and "col3" using the values in "col1", "col2" and "col3" in "discription" line. All the other null values I don't want to fill in for "col1", "col2" and "col3".
So the final dataframe should be look like below.
id1 ID col1 Goal col2 col3 Date
0 85643 G-00001 NaN NaN 793.000 500.000 2021-06-13
1 85644 G-00001 5.0000 56.0000 8.0000 89.0000 2021-06-13
2 discription G-00002 5.0000 78.0000 8.0000 89.0000 2021-06-14
3 8564312 G-00002 NaN NaN NaN NaN 2021-06-13
4 8564314 G-00002 4.0000 89.0000 43.0000 44.0000 2021-06-14
5 85645 G-00001 60.0000 73.0000 610.000 60.0000 2021-06-15
6 discription G-00001 60.0000 NaN 610.000 60.0000 2021-06-15
7 85646 G-00001 NaN NaN NaN NaN 2021-06-13
8 8564318 G-00001 0.0000 NaN 0.0000 5.0000 2021-06-16
9 85647 G-00001 6.0000 NaN 16.0000 6.0000 2021-06-13
10 85648 G-00002 3.0000 NaN 23.0000 3.0000 2021-06-13
11 85649 G-00002 2.0000 34.0000 72.0000 2.0000 2021-06-13
12 85655 G-00002 4.0000 NaN 48.0000 4.0000 2021-06-16
13 56731 G-00002 32.0000 7.0000 3.0000 13.0000 2021-05-23
14 34566 G-00003 3.0000 84.0000 28.0000 12.0000 2021-05-13
15 78931 G-00003 1.0000 NaN 5.0000 14.0000 2021-03-26
16 78931 G-00003 23.0000 5.0000 3.0000 98.0000 2021-05-13
Is it possible to do this in python? Any suggestion would be appreciated. Thanks in advance!
CodePudding user response:
Mask the rows which do not have discription
in id1
, then shift
the masked dataframe one unit upwards and fill the nan values in required columns
c = ['col1', 'col2', 'col3']
mask = df['id1'].eq('discription')
df[c] = df[c].fillna(df[c].where(mask).shift(-1))
id1 ID col1 Goal col2 col3 Date
0 85643 G-00001 NaN NaN 793.0 500.0 2021-06-13
1 85644 G-00001 5.0 56.0 8.0 89.0 2021-06-13
2 discription G-00002 5.0 78.0 8.0 89.0 2021-06-14
3 8564312 G-00002 NaN NaN NaN NaN 2021-06-13
4 8564314 G-00002 4.0 89.0 43.0 44.0 2021-06-14
5 85645 G-00001 60.0 73.0 610.0 60.0 2021-06-15
6 discription G-00001 60.0 NaN 610.0 60.0 2021-06-15
7 85646 G-00001 NaN NaN NaN NaN 2021-06-13
8 8564318 G-00001 0.0 NaN 0.0 5.0 2021-06-16
9 85647 G-00001 6.0 NaN 16.0 6.0 2021-06-13
10 85648 G-00002 3.0 NaN 23.0 3.0 2021-06-13
11 85649 G-00002 2.0 34.0 72.0 2.0 2021-06-13
12 85655 G-00002 4.0 NaN 48.0 4.0 2021-06-16
13 56731 G-00002 32.0 7.0 3.0 13.0 2021-05-23
14 34566 G-00003 3.0 84.0 28.0 12.0 2021-05-13
15 78931 G-00003 1.0 NaN 5.0 14.0 2021-03-26
16 78931 G-00003 23.0 5.0 3.0 98.0 2021-05-13