Home > Blockchain >  Get non empty values of dataframe as a single column
Get non empty values of dataframe as a single column

Time:01-10

I have a sparse dataframe and would like to get all non empty values as a single column. See the image that I made up to illustrate the problem. I somehow managed to solve it using the python code below. However, I feel there migh be some better | simpler | efficient way to solve it enter image description here

import pandas as pd

list1 = ["x1","x2","?","?","?","?"]
list2 = ["?","?","y1","y2","?","?"]
list3 = ["?","?","?","?","z1","z2"]

df_sparse = pd.DataFrame({"A":list1,"B":list2,"C":list3})
values_vect = []
for col in df_sparse.columns:
  values = [ i for i in list(df_sparse[col]) if i !="?"]
  values_vect.extend(values)
df_sparse["D"] = pd.DataFrame(values_vect,columns=["D"])
display(df_sparse)

CodePudding user response:

df_sparse["D"] = df_sparse.replace("?", np.nan).ffill(axis="columns").iloc[:, -1]
  • replace "?"s with NaNs
  • forward fill the values along columns so that non-NaN values will slide to the rightmost positions
  • query the rightmost column, that's where the values are

to get

>>> df_sparse

    A   B   C   D
0  x1   ?   ?  x1
1  x2   ?   ?  x2
2   ?  y1   ?  y1
3   ?  y2   ?  y2
4   ?   ?  z1  z1
5   ?   ?  z2  z2

CodePudding user response:

Using masking, stack and groupby.last:

df_sparse['D'] = (df_sparse
 .where(df_sparse.ne('?'))
 .stack()
 .groupby(level=0).last()
 )

print(df_sparse)

Output:


    A   B   C   D
0  x1   ?   ?  x1
1  x2   ?   ?  x2
2   ?  y1   ?  y1
3   ?  y2   ?  y2
4   ?   ?  z1  z1
5   ?   ?  z2  z2
  • Related