Remove empty cells and move content to the left Python-CodePudding

I have an Excel file which contains the following structure with no header or index - I've added just the column header from Excel without existing in my table:

A	B	C	D	E	...	J	K
ABC	XLS	1231341231	123123asdad1923	NaN	NaN	...	123123asdad1923
DEF	XLS	1231231231	1231823asda9123	NaN	askda213	...	123123asdad1923
XYZ	XLS	1231231233	2138820394832sd	NaN	NaN	...	asdasdq2ew12332

I would need that all the cells that contains NaN to be removed and while NaN, move the cells to the left.

Output should look like:

A	B	C	D	E	...	J
ABC	XLS	1231341231	123123asdad1923	123123asdad1923
DEF	XLS	1231231231	1231823asda9123	askda213	...	123123asdad1923
XYZ	XLS	1231231233	2138820394832sd	asdasdq2ew12332

I have found this similar question dropped on Stackoverflow, but the answer is not doing any change:

import pandas as pd

df = pd.read_excel('test.xlsx')
df = df.apply(lambda x: pd.Series(x.dropna().values))
df = df.fillna('')

df.to_excel("test_modified.xlsx", index=False, header=False)

Any ideas how can achieve this?

Thank you.

CodePudding user response：

One solution I thought of would be to for each row, sort the row with a custom sorting function that returned 0 for NaN and 1 for non-NaN, effectively moving (but without changing the order of) all non-NaN values to be consecutive, and leaving the NaNs at the end:

new_df = df.apply(lambda row: pd.Series(sorted(row.tolist(), key=lambda x: np.isnan(x) if isinstance(x, float) else 0), index=row.index), axis=1)

Output:

>>> new_df
     A    B           C                D                E                J   K
0  ABC  XLS  1231341231  123123asdad1923  123123asdad1923              NaN NaN
1  DEF  XLS  1231231231  1231823asda9123         askda213  123123asdad1923 NaN
2  XYZ  XLS  1231231233  2138820394832sd  asdasdq2ew12332              NaN NaN

CodePudding user response：

Ok, then I propose to transpose the dataset, process it, and transpose it back.

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        "A": ["ABC", "DEF", "XYZ"],
        "B": ["XLS", "XLS", "XLS"],
        "C": ["1231341231", "1231231231", "1231231233"],
        "D": ["123123asdad1923", "1231823asda9123", "2138820394832sd"],
        "E": [np.nan, np.nan, np.nan],
        "F": [np.nan, "askda213", np.nan],
        "I": ["blabla", "blabla", np.nan],
        "K": ["123123asdad1923", "123123asdad1923", "asdasdq2ew12332"],
    }
)

print(df)
# transpose
df_transposed = df.T
# move NaN at end of column
df_transposed = df_transposed.apply(lambda x: pd.Series(x.dropna().values))
# replace Nan by empty strings
df_transposed.replace(np.nan, "", inplace=True)
# transpose back
df = df_transposed.T
print()
print(df)

will output:

     A    B           C                D   E         F       I                K
0  ABC  XLS  1231341231  123123asdad1923 NaN       NaN  blabla  123123asdad1923
1  DEF  XLS  1231231231  1231823asda9123 NaN  askda213  blabla  123123asdad1923
2  XYZ  XLS  1231231233  2138820394832sd NaN       NaN     NaN  asdasdq2ew12332

     0    1           2                3                4                5                6
0  ABC  XLS  1231341231  123123asdad1923           blabla  123123asdad1923                 
1  DEF  XLS  1231231231  1231823asda9123         askda213           blabla  123123asdad1923
2  XYZ  XLS  1231231233  2138820394832sd  asdasdq2ew12332