I have an Excel file which contains the following structure with no header or index - I've added just the column header from Excel without existing in my table:
A | B | C | D | E | ... | J | K |
---|---|---|---|---|---|---|---|
ABC | XLS | 1231341231 | 123123asdad1923 | NaN | NaN | ... | 123123asdad1923 |
DEF | XLS | 1231231231 | 1231823asda9123 | NaN | askda213 | ... | 123123asdad1923 |
XYZ | XLS | 1231231233 | 2138820394832sd | NaN | NaN | ... | asdasdq2ew12332 |
I would need that all the cells that contains NaN
to be removed and while NaN, move the cells to the left.
Output should look like:
A | B | C | D | E | ... | J | K |
---|---|---|---|---|---|---|---|
ABC | XLS | 1231341231 | 123123asdad1923 | 123123asdad1923 | |||
DEF | XLS | 1231231231 | 1231823asda9123 | askda213 | ... | 123123asdad1923 | |
XYZ | XLS | 1231231233 | 2138820394832sd | asdasdq2ew12332 |
I have found this similar question dropped on Stackoverflow, but the answer is not doing any change:
import pandas as pd
df = pd.read_excel('test.xlsx')
df = df.apply(lambda x: pd.Series(x.dropna().values))
df = df.fillna('')
df.to_excel("test_modified.xlsx", index=False, header=False)
Any ideas how can achieve this?
Thank you.
CodePudding user response:
One solution I thought of would be to for each row, sort the row with a custom sorting function that returned 0 for NaN and 1 for non-NaN, effectively moving (but without changing the order of) all non-NaN values to be consecutive, and leaving the NaNs at the end:
new_df = df.apply(lambda row: pd.Series(sorted(row.tolist(), key=lambda x: np.isnan(x) if isinstance(x, float) else 0), index=row.index), axis=1)
Output:
>>> new_df
A B C D E J K
0 ABC XLS 1231341231 123123asdad1923 123123asdad1923 NaN NaN
1 DEF XLS 1231231231 1231823asda9123 askda213 123123asdad1923 NaN
2 XYZ XLS 1231231233 2138820394832sd asdasdq2ew12332 NaN NaN
CodePudding user response:
Ok, then I propose to transpose the dataset, process it, and transpose it back.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
"A": ["ABC", "DEF", "XYZ"],
"B": ["XLS", "XLS", "XLS"],
"C": ["1231341231", "1231231231", "1231231233"],
"D": ["123123asdad1923", "1231823asda9123", "2138820394832sd"],
"E": [np.nan, np.nan, np.nan],
"F": [np.nan, "askda213", np.nan],
"I": ["blabla", "blabla", np.nan],
"K": ["123123asdad1923", "123123asdad1923", "asdasdq2ew12332"],
}
)
print(df)
# transpose
df_transposed = df.T
# move NaN at end of column
df_transposed = df_transposed.apply(lambda x: pd.Series(x.dropna().values))
# replace Nan by empty strings
df_transposed.replace(np.nan, "", inplace=True)
# transpose back
df = df_transposed.T
print()
print(df)
will output:
A B C D E F I K
0 ABC XLS 1231341231 123123asdad1923 NaN NaN blabla 123123asdad1923
1 DEF XLS 1231231231 1231823asda9123 NaN askda213 blabla 123123asdad1923
2 XYZ XLS 1231231233 2138820394832sd NaN NaN NaN asdasdq2ew12332
0 1 2 3 4 5 6
0 ABC XLS 1231341231 123123asdad1923 blabla 123123asdad1923
1 DEF XLS 1231231231 1231823asda9123 askda213 blabla 123123asdad1923
2 XYZ XLS 1231231233 2138820394832sd asdasdq2ew12332