Home > Blockchain >  Pandas combine row values into list for every non-null value of another column
Pandas combine row values into list for every non-null value of another column

Time:12-24

I would like to combine all row values into a list, whenever a non-null string is found in another column.

For example if I have this pandas dataframe:

df = pd.DataFrame({'X': [1,2,3,4,5,6,7,8],
                   'Y': [10,20,30,40,50,60,70,80],
                   'Z': [np.nan, np.nan, "A", np.nan, "A", "B", np.nan, np.nan]})
    X   Y   Z
0   1   10  NaN
1   2   20  NaN
2   3   30  A
3   4   40  NaN
4   5   50  A
5   6   60  B
6   7   70  NaN
7   8   80  NaN

I would like to combine all previous row values from columns X and Y into lists, whenever column Z has a non-null string, like this:

df = pd.DataFrame({'X': [[1,2,3],[4,5],[6]],
                   'Y': [[10,20,30],[40,50],[60]],
                   'Z': ["A","A", "B"]})

           X               Y    Z
0   [1, 2, 3]   [10, 20, 30]    A
1      [4, 5]       [40, 50]    A
2         [6]           [60]    B

So what I managed to do is "solve" it by using for loops. I would hope there is a better way to do it with pandas but I can't seem to find it.

My for loop solution:

Get "Z" ids without NaNs:

z_idx_withoutNaN = df[~df["Z"].isnull() == True].index.tolist()

[2, 4, 5]

Loop over ids and create lists with "X" and "Y" values:

x_list = []
y_list = []

for i, index in enumerate(z_idx_withoutNaN):
    if i == 0:
        x_list =  [df.iloc[:index 1]["X"].values.tolist()]
        y_list =  [df.iloc[:index 1]["Y"].values.tolist()] 
    else:
        x_list.append(df.iloc[previous_index:index 1]["X"].values.tolist())
        y_list.append(df.iloc[previous_index:index 1]["Y"].values.tolist())
    
    previous_index = index   1

Finally, create df:

pd.DataFrame({"X": x_list,
              "Y": y_list,
              "Z": df[~df["Z"].isnull()]["Z"].values.tolist()})


            X              Y    Z
0   [1, 2, 3]   [10, 20, 30]    A
1      [4, 5]       [40, 50]    A
2         [6]           [60]    B

CodePudding user response:

Let us do

out = (df.groupby(df['Z'].iloc[::-1].notna().cumsum()).
       agg({'X':list,'Y':list,'Z':'first'}).
       dropna().
       sort_index(ascending=False))
Out[23]: 
           X             Y  Z
Z                            
3  [1, 2, 3]  [10, 20, 30]  A
2     [4, 5]      [40, 50]  A
1        [6]          [60]  B

CodePudding user response:

Here is one option:

(df.groupby(
  df.Z.shift().notnull().cumsum()
).agg(list)
 .assign(Z = lambda x: x.Z.str[-1])[
    lambda x: x.Z.notnull()
])

           X             Y  Z
Z                            
0  [1, 2, 3]  [10, 20, 30]  A
1     [4, 5]      [40, 50]  A
2        [6]          [60]  B
  • Related