I have the following data frame:
df_test = pd.DataFrame({"f":['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'],
"d":['x', 'x', 'y', 'y', 'x', 'x', 'y', 'y'],
"low": [0,5,2,4,5,10,4,8],
"up": [5,10,4,6,10,15,8,12],
"z": [1,3,6,2,3,7,5,10]})
and what I first have to do is to convert the columns 'low', 'up' and 'z' to list for each (grouped by) 'f' and 'd'. so this is what I did:
dff = df_test.groupby(['f','d'])[['low', 'up', 'z']].agg(list).reset_index()
Now I want to extract the last value from the lists in column 'up' and add it to the lists in column 'low'. But this is unfortunately not working:
dff['last'] = (dff['up'].apply(lambda x: x[-1])).tolist()
dff['new'] = dff['low'].append(dff['last'])
I get an error message "ValueError: cannot reindex from a duplicate axis". The column 'new' should have these values: [0,5,10], [2,4,6], [5,10,15], [4,8,12]
any help is very much appreciated!
CodePudding user response:
Try:
dff["new"] = dff.apply(lambda x: [*x["low"], x["up"].pop()], axis=1)
print(dff)
Prints:
f d low up z new
0 a x [0, 5] [5] [1, 3] [0, 5, 10]
1 a y [2, 4] [4] [6, 2] [2, 4, 6]
2 b x [5, 10] [10] [3, 7] [5, 10, 15]
3 b y [4, 8] [8] [5, 10] [4, 8, 12]
If you want to keep the last element in up
column:
dff["new"] = dff.apply(lambda x: [*x["low"], x["up"][-1]], axis=1)
CodePudding user response:
Take advantage of the mutability of lists, use a pure python loop that should be more efficient than apply
.
To copy the element:
for l, u in zip(dff['low'], dff['up']):
l.append(u[-1])
Output:
f d low up z
0 a x [0, 5, 10] [5, 10] [1, 3]
1 a y [2, 4, 6] [4, 6] [6, 2]
2 b x [5, 10, 15] [10, 15] [3, 7]
3 b y [4, 8, 12] [8, 12] [5, 10]
To move the element:
for l, u in zip(dff['low'], dff['up']):
l.append(u.pop(-1))
Output:
f d low up z
0 a x [0, 5, 10] [5] [1, 3]
1 a y [2, 4, 6] [4] [6, 2]
2 b x [5, 10, 15] [10] [3, 7]
3 b y [4, 8, 12] [8] [5, 10]
For a new column use slicing:
dff['new'] = dff['low'] dff['up'].str[-1:]
Or a list comprehension (should be slower):
dff['new'] = [l [u[-1]] for l, u in zip(dff['low'], dff['up'])]
Output:
f d low up z new
0 a x [0, 5] [5, 10] [1, 3] [0, 5, 10]
1 a y [2, 4] [4, 6] [6, 2] [2, 4, 6]
2 b x [5, 10] [10, 15] [3, 7] [5, 10, 15]
3 b y [4, 8] [8, 12] [5, 10] [4, 8, 12]
CodePudding user response:
Another possible solution:
dff['new'] = dff['low'] pd.Series([[x[1]] for x in dff['up']])
Output:
f d low up z new
0 a x [0, 5] [5, 10] [1, 3] [0, 5, 10]
1 a y [2, 4] [4, 6] [6, 2] [2, 4, 6]
2 b x [5, 10] [10, 15] [3, 7] [5, 10, 15]
3 b y [4, 8] [8, 12] [5, 10] [4, 8, 12]