Home > Enterprise >  Using .diff() on Pandas DataFrame to subtract *list* elements
Using .diff() on Pandas DataFrame to subtract *list* elements

Time:12-07

I have a Pandas DataFrame that looks like this

     col1                                   coll2    col3
0       1   [ListItem1.1,ListItem1.2,ListItem1.3]  value1
1       1   [ListItem2.1,ListItem2.2,ListItem2.3]  value2
2       1   [ListItem3.1,ListItem3.2,ListItem3.3]  value3
3       1   [ListItem4.1,ListItem4.2,ListItem4.3]  value4
4       1   [ListItem5.1,ListItem5.2,ListItem5.3]  value5

And need to produce a dataframe that looks like this

     col1                                                                       coll2             col3
0       1   [ListItem2.1-ListItem1.1,ListItem2.2-ListItem1.2,ListItem2.3-ListItem1.3]  value2 - value1
1       1   [ListItem3.1-ListItem2.1,ListItem3.2-ListItem2.2,ListItem3.3-ListItem2.3]  value3 - value2
2       1   [ListItem4.1-ListItem3.1,ListItem4.2-ListItem3.2,ListItem4.3-ListItem3.3]  value4 - value3
3       1   [ListItem5.1-ListItem4.1,ListItem5.2-ListItem4.2,ListItem5.3-ListItem4.3]  value5 - value4

.diff() would normally work great, but how can I tell it that the value for col2 is of type list and therefore it should be broken up, subtracted, then recombined? I don't want to iterate over the rows, since it would be significantly slower.

Really appreciate your advice.

CodePudding user response:

There are same length of lists, so possible convert to DataFrame, get difference and convert back to original column:

df['col2'] = pd.DataFrame(df['col2'].tolist()).diff(-1).to_numpy().tolist()

CodePudding user response:

You can convert your lists to numpy arrays to be able to use diff:

data = {'col1': {0: 1, 1: 1, 2: 1},
        'col2': {0: [1, 2, 3], 1: [4, 5, 6], 2: [7, 8, 9]},
        'col3': {0: 3, 1: 2, 2: 1}}
df = pd.DataFrame(data)

out = df.assign(col2=df['col2'].apply(np.array)).diff()
print(out)

# Output:
   col1       col2  col3
0   NaN        NaN   NaN
1   0.0  [3, 3, 3]  -1.0
2   0.0  [3, 3, 3]  -1.0
  • Related