Using .diff() on Pandas DataFrame to subtract *list* elements-CodePudding

I have a Pandas DataFrame that looks like this

     col1                                   coll2    col3
0       1   [ListItem1.1,ListItem1.2,ListItem1.3]  value1
1       1   [ListItem2.1,ListItem2.2,ListItem2.3]  value2
2       1   [ListItem3.1,ListItem3.2,ListItem3.3]  value3
3       1   [ListItem4.1,ListItem4.2,ListItem4.3]  value4
4       1   [ListItem5.1,ListItem5.2,ListItem5.3]  value5

And need to produce a dataframe that looks like this

     col1                                                                       coll2             col3
0       1   [ListItem2.1-ListItem1.1,ListItem2.2-ListItem1.2,ListItem2.3-ListItem1.3]  value2 - value1
1       1   [ListItem3.1-ListItem2.1,ListItem3.2-ListItem2.2,ListItem3.3-ListItem2.3]  value3 - value2
2       1   [ListItem4.1-ListItem3.1,ListItem4.2-ListItem3.2,ListItem4.3-ListItem3.3]  value4 - value3
3       1   [ListItem5.1-ListItem4.1,ListItem5.2-ListItem4.2,ListItem5.3-ListItem4.3]  value5 - value4

.diff() would normally work great, but how can I tell it that the value for col2 is of type list and therefore it should be broken up, subtracted, then recombined? I don't want to iterate over the rows, since it would be significantly slower.

Really appreciate your advice.

CodePudding user response：

There are same length of lists, so possible convert to DataFrame, get difference and convert back to original column:

df['col2'] = pd.DataFrame(df['col2'].tolist()).diff(-1).to_numpy().tolist()

CodePudding user response：

You can convert your lists to numpy arrays to be able to use diff:

data = {'col1': {0: 1, 1: 1, 2: 1},
        'col2': {0: [1, 2, 3], 1: [4, 5, 6], 2: [7, 8, 9]},
        'col3': {0: 3, 1: 2, 2: 1}}
df = pd.DataFrame(data)

out = df.assign(col2=df['col2'].apply(np.array)).diff()
print(out)

# Output:
   col1       col2  col3
0   NaN        NaN   NaN
1   0.0  [3, 3, 3]  -1.0
2   0.0  [3, 3, 3]  -1.0