Here's the pandas dataframe that I'm using to learn how to do this:
import pandas as pd
test_list = pd.DataFrame()
test_list["Item"] = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K"]
test_list["Number"] = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"]
test_list["Combined Numbers"]= ""
Based on that dataframe above, I intend to combine up to 3 numbers, separated by commas.
Following that, I intend to repeat this combined value I now have, for each of the test_list["Item"]
and test_list["Number"]
involved.
I've been scratching my head figuring it out so far. So far I've seen examples of groupby()
function for situations like combining information based on a given criteria, like a duplicate value from a column. I'm learning to explore if I don't have anything to refer to, how can I work this out instead?
Here's my intended goal:
Item | Number | Combined Numbers |
---|---|---|
A | 1 | 1, 2, 3 |
B | 2 | 1, 2, 3 |
C | 3 | 1, 2, 3 |
D | 4 | 4, 5, 6 |
E | 5 | 4, 5, 6 |
F | 6 | 4, 5, 6 |
G | 7 | 7, 8, 9 |
H | 8 | 7, 8, 9 |
I | 9 | 7, 8, 9 |
J | 10 | 10, 11 |
K | 11 | 10, 11 |
Thank you
CodePudding user response:
In you case as you don't have this column with the groups, you can generate a range of the length of your dataframe (with np.arange
) and do the floor division by 3 (//3
). Use groupby.transform
to keep the original shape of your data and do the join
operation on the column.
test_list["Combined Numbers"] = (
test_list.groupby(np.arange(len(test_list))//3)
['Number'].transform(', '.join)
)
print(test_list)
# Item Number Combined Numbers
# 0 A 1 1, 2, 3
# 1 B 2 1, 2, 3
# 2 C 3 1, 2, 3
# 3 D 4 4, 5, 6
# 4 E 5 4, 5, 6
# 5 F 6 4, 5, 6
# 6 G 7 7, 8, 9
# 7 H 8 7, 8, 9
# 8 I 9 7, 8, 9
# 9 J 10 10, 11
# 10 K 11 10, 11