Fill in empty space by merging columns using Pandas Python-CodePudding

I am a Python noob. I have an unstructured text file that I'm trying to capture to a dataframe and export to excel. I need to merge 38 to 36, 45 to 43, and 79 to 78 filling in the empty space with the data on the merging column.

Dummy Dataset

	0	5	36	38	43	45	78	79
1	A	01JUN2022		1.2		B		1.2
2	C	01JUN2022		1.4		D		1.4
3	E	01JUN2022	1.5		F		1.6
4	G	01JUN2022	1.7		H		1.7
5	I	01JUN2022	1.4		J		1.8
6	K	01JUN2022	1.7		L		1.3
1	A	01JUN2022		1.2		B		1.2
2	C	01JUN2022		1.4	D		1.4
3	E	01JUN2022		1.5	F		1.6
4	G	01JUN2022	1.7		H		1.7
5	I	01JUN2022	1.4			J		1.8
6	K	01JUN2022	1.7		L			1.3

Required output

	0	5	36	43	79
1	A	01JUN2022	1.2	B	1.2
2	C	01JUN2022	1.4	D	1.4
3	E	01JUN2022	1.5	F	1.6
4	G	01JUN2022	1.7	H	1.7
5	I	01JUN2022	1.4	J	1.8
6	K	01JUN2022	1.7	L	1.3
1	A	01JUN2022	1.2	B	1.2
2	C	01JUN2022	1.4	D	1.4
3	E	01JUN2022	1.5	F	1.6
4	G	01JUN2022	1.7	H	1.7
5	I	01JUN2022	1.4	J	1.8
6	K	01JUN2022	1.7	L	1.3

CodePudding user response：

Would start by converting '' to NaN as follows

df = df.replace(r'^\s*$', np.nan, regex=True)

Then one can use pandas.Series.combine_first

df['36'] = df['36'].combine_first(df['38'])
df['43'] = df['43'].combine_first(df['45'])
df['79'] = df['79'].combine_first(df['78'])

[Out]:
    id  0          5   36   38 43   45   78   79
0    1  A  01JUN2022  1.2  1.2  B    B  NaN  1.2
1    2  C  01JUN2022  1.4  1.4  D    D  NaN  1.4
2    3  E  01JUN2022  1.5  NaN  F  NaN  1.6  1.6
3    4  G  01JUN2022  1.7  NaN  H  NaN  1.7  1.7
4    5  I  01JUN2022  1.4  NaN  J  NaN  1.8  1.8
5    6  K  01JUN2022  1.7  NaN  L  NaN  1.3  1.3
6    1  A  01JUN2022  1.2  1.2  B    B  NaN  1.2
7    2  C  01JUN2022  1.4  1.4  D  NaN  1.4  1.4
8    3  E  01JUN2022  1.5  1.5  F  NaN  1.6  1.6
9    4  G  01JUN2022  1.7  NaN  H  NaN  1.7  1.7
10   5  I  01JUN2022  1.4  NaN  J    J  NaN  1.8
11   6  K  01JUN2022  1.7  NaN  L  NaN  NaN  1.3

Finally, one can drop the columns that one doesn't want or select the one's to display as follows

df = df[['0', '5', '36', '43', '79']]

[Out]:

    0          5   36 43   79
0   A  01JUN2022  1.2  B  1.2
1   C  01JUN2022  1.4  D  1.4
2   E  01JUN2022  1.5  F  1.6
3   G  01JUN2022  1.7  H  1.7
4   I  01JUN2022  1.4  J  1.8
5   K  01JUN2022  1.7  L  1.3
6   A  01JUN2022  1.2  B  1.2
7   C  01JUN2022  1.4  D  1.4
8   E  01JUN2022  1.5  F  1.6
9   G  01JUN2022  1.7  H  1.7
10  I  01JUN2022  1.4  J  1.8
11  K  01JUN2022  1.7  L  1.3

and this gives the desired output.

Notes:

There are additional ways to remove NaN and combine the columns. For that, see: How to remove nan value while combining two column in Panda Data frame?
There are also additional ways to delete columns in Python. If one wants to learn more about it, see: Delete a column from a Pandas DataFrame