I have a Pandas DataFrame that looks like this
col1 coll2 col3
0 1 [ListItem1.1,ListItem1.2,ListItem1.3] value1
1 1 [ListItem2.1,ListItem2.2,ListItem2.3] value2
2 1 [ListItem3.1,ListItem3.2,ListItem3.3] value3
3 1 [ListItem4.1,ListItem4.2,ListItem4.3] value4
4 1 [ListItem5.1,ListItem5.2,ListItem5.3] value5
And need to produce a dataframe that looks like this
col1 coll2 col3
0 1 [ListItem2.1-ListItem1.1,ListItem2.2-ListItem1.2,ListItem2.3-ListItem1.3] value2 - value1
1 1 [ListItem3.1-ListItem2.1,ListItem3.2-ListItem2.2,ListItem3.3-ListItem2.3] value3 - value2
2 1 [ListItem4.1-ListItem3.1,ListItem4.2-ListItem3.2,ListItem4.3-ListItem3.3] value4 - value3
3 1 [ListItem5.1-ListItem4.1,ListItem5.2-ListItem4.2,ListItem5.3-ListItem4.3] value5 - value4
.diff() would normally work great, but how can I tell it that the value for col2 is of type list
and therefore it should be broken up, subtracted, then recombined? I don't want to iterate over the rows, since it would be significantly slower.
Really appreciate your advice.
CodePudding user response:
There are same length of lists, so possible convert to DataFrame, get difference and convert back to original column:
df['col2'] = pd.DataFrame(df['col2'].tolist()).diff(-1).to_numpy().tolist()
CodePudding user response:
You can convert your lists to numpy arrays to be able to use diff
:
data = {'col1': {0: 1, 1: 1, 2: 1},
'col2': {0: [1, 2, 3], 1: [4, 5, 6], 2: [7, 8, 9]},
'col3': {0: 3, 1: 2, 2: 1}}
df = pd.DataFrame(data)
out = df.assign(col2=df['col2'].apply(np.array)).diff()
print(out)
# Output:
col1 col2 col3
0 NaN NaN NaN
1 0.0 [3, 3, 3] -1.0
2 0.0 [3, 3, 3] -1.0