Home > other >  Fill in empty space by merging columns using Pandas Python
Fill in empty space by merging columns using Pandas Python

Time:10-06

I am a Python noob. I have an unstructured text file that I'm trying to capture to a dataframe and export to excel. I need to merge 38 to 36, 45 to 43, and 79 to 78 filling in the empty space with the data on the merging column.

Dummy Dataset

0 5 36 38 43 45 78 79
1 A 01JUN2022 1.2 B 1.2
2 C 01JUN2022 1.4 D 1.4
3 E 01JUN2022 1.5 F 1.6
4 G 01JUN2022 1.7 H 1.7
5 I 01JUN2022 1.4 J 1.8
6 K 01JUN2022 1.7 L 1.3
1 A 01JUN2022 1.2 B 1.2
2 C 01JUN2022 1.4 D 1.4
3 E 01JUN2022 1.5 F 1.6
4 G 01JUN2022 1.7 H 1.7
5 I 01JUN2022 1.4 J 1.8
6 K 01JUN2022 1.7 L 1.3

Required output

0 5 36 43 79
1 A 01JUN2022 1.2 B 1.2
2 C 01JUN2022 1.4 D 1.4
3 E 01JUN2022 1.5 F 1.6
4 G 01JUN2022 1.7 H 1.7
5 I 01JUN2022 1.4 J 1.8
6 K 01JUN2022 1.7 L 1.3
1 A 01JUN2022 1.2 B 1.2
2 C 01JUN2022 1.4 D 1.4
3 E 01JUN2022 1.5 F 1.6
4 G 01JUN2022 1.7 H 1.7
5 I 01JUN2022 1.4 J 1.8
6 K 01JUN2022 1.7 L 1.3

CodePudding user response:

Would start by converting '' to NaN as follows

df = df.replace(r'^\s*$', np.nan, regex=True)

Then one can use pandas.Series.combine_first

df['36'] = df['36'].combine_first(df['38'])
df['43'] = df['43'].combine_first(df['45'])
df['79'] = df['79'].combine_first(df['78'])

[Out]:
    id  0          5   36   38 43   45   78   79
0    1  A  01JUN2022  1.2  1.2  B    B  NaN  1.2
1    2  C  01JUN2022  1.4  1.4  D    D  NaN  1.4
2    3  E  01JUN2022  1.5  NaN  F  NaN  1.6  1.6
3    4  G  01JUN2022  1.7  NaN  H  NaN  1.7  1.7
4    5  I  01JUN2022  1.4  NaN  J  NaN  1.8  1.8
5    6  K  01JUN2022  1.7  NaN  L  NaN  1.3  1.3
6    1  A  01JUN2022  1.2  1.2  B    B  NaN  1.2
7    2  C  01JUN2022  1.4  1.4  D  NaN  1.4  1.4
8    3  E  01JUN2022  1.5  1.5  F  NaN  1.6  1.6
9    4  G  01JUN2022  1.7  NaN  H  NaN  1.7  1.7
10   5  I  01JUN2022  1.4  NaN  J    J  NaN  1.8
11   6  K  01JUN2022  1.7  NaN  L  NaN  NaN  1.3

Finally, one can drop the columns that one doesn't want or select the one's to display as follows

df = df[['0', '5', '36', '43', '79']]

[Out]:

    0          5   36 43   79
0   A  01JUN2022  1.2  B  1.2
1   C  01JUN2022  1.4  D  1.4
2   E  01JUN2022  1.5  F  1.6
3   G  01JUN2022  1.7  H  1.7
4   I  01JUN2022  1.4  J  1.8
5   K  01JUN2022  1.7  L  1.3
6   A  01JUN2022  1.2  B  1.2
7   C  01JUN2022  1.4  D  1.4
8   E  01JUN2022  1.5  F  1.6
9   G  01JUN2022  1.7  H  1.7
10  I  01JUN2022  1.4  J  1.8
11  K  01JUN2022  1.7  L  1.3

and this gives the desired output.


Notes:

  • Related