I would like to combine all row values into a list, whenever a non-null string is found in another column.
For example if I have this pandas dataframe:
df = pd.DataFrame({'X': [1,2,3,4,5,6,7,8],
'Y': [10,20,30,40,50,60,70,80],
'Z': [np.nan, np.nan, "A", np.nan, "A", "B", np.nan, np.nan]})
X Y Z
0 1 10 NaN
1 2 20 NaN
2 3 30 A
3 4 40 NaN
4 5 50 A
5 6 60 B
6 7 70 NaN
7 8 80 NaN
I would like to combine all previous row values from columns X and Y into lists, whenever column Z has a non-null string, like this:
df = pd.DataFrame({'X': [[1,2,3],[4,5],[6]],
'Y': [[10,20,30],[40,50],[60]],
'Z': ["A","A", "B"]})
X Y Z
0 [1, 2, 3] [10, 20, 30] A
1 [4, 5] [40, 50] A
2 [6] [60] B
So what I managed to do is "solve" it by using for loops. I would hope there is a better way to do it with pandas but I can't seem to find it.
My for loop solution:
Get "Z" ids without NaNs:
z_idx_withoutNaN = df[~df["Z"].isnull() == True].index.tolist()
[2, 4, 5]
Loop over ids and create lists with "X" and "Y" values:
x_list = []
y_list = []
for i, index in enumerate(z_idx_withoutNaN):
if i == 0:
x_list = [df.iloc[:index 1]["X"].values.tolist()]
y_list = [df.iloc[:index 1]["Y"].values.tolist()]
else:
x_list.append(df.iloc[previous_index:index 1]["X"].values.tolist())
y_list.append(df.iloc[previous_index:index 1]["Y"].values.tolist())
previous_index = index 1
Finally, create df:
pd.DataFrame({"X": x_list,
"Y": y_list,
"Z": df[~df["Z"].isnull()]["Z"].values.tolist()})
X Y Z
0 [1, 2, 3] [10, 20, 30] A
1 [4, 5] [40, 50] A
2 [6] [60] B
CodePudding user response:
Let us do
out = (df.groupby(df['Z'].iloc[::-1].notna().cumsum()).
agg({'X':list,'Y':list,'Z':'first'}).
dropna().
sort_index(ascending=False))
Out[23]:
X Y Z
Z
3 [1, 2, 3] [10, 20, 30] A
2 [4, 5] [40, 50] A
1 [6] [60] B
CodePudding user response:
Here is one option:
(df.groupby(
df.Z.shift().notnull().cumsum()
).agg(list)
.assign(Z = lambda x: x.Z.str[-1])[
lambda x: x.Z.notnull()
])
X Y Z
Z
0 [1, 2, 3] [10, 20, 30] A
1 [4, 5] [40, 50] A
2 [6] [60] B