I have a sparse dataframe and would like to get all non empty values as a single column. See the image that I made up to illustrate the problem. I somehow managed to solve it using the python code below. However, I feel there migh be some better | simpler | efficient way to solve it
import pandas as pd
list1 = ["x1","x2","?","?","?","?"]
list2 = ["?","?","y1","y2","?","?"]
list3 = ["?","?","?","?","z1","z2"]
df_sparse = pd.DataFrame({"A":list1,"B":list2,"C":list3})
values_vect = []
for col in df_sparse.columns:
values = [ i for i in list(df_sparse[col]) if i !="?"]
values_vect.extend(values)
df_sparse["D"] = pd.DataFrame(values_vect,columns=["D"])
display(df_sparse)
CodePudding user response:
df_sparse["D"] = df_sparse.replace("?", np.nan).ffill(axis="columns").iloc[:, -1]
- replace "?"s with NaNs
- forward fill the values along columns so that non-NaN values will slide to the rightmost positions
- query the rightmost column, that's where the values are
to get
>>> df_sparse
A B C D
0 x1 ? ? x1
1 x2 ? ? x2
2 ? y1 ? y1
3 ? y2 ? y2
4 ? ? z1 z1
5 ? ? z2 z2
CodePudding user response:
Using masking, stack
and groupby.last
:
df_sparse['D'] = (df_sparse
.where(df_sparse.ne('?'))
.stack()
.groupby(level=0).last()
)
print(df_sparse)
Output:
A B C D
0 x1 ? ? x1
1 x2 ? ? x2
2 ? y1 ? y1
3 ? y2 ? y2
4 ? ? z1 z1
5 ? ? z2 z2