I'm trying to remove duplicates values in ID column, count the duplicates in the ID column and create a new column called Count, and concatenate the Axis column
THIS IS MY CURRENT DATAFRAME:
ID Axis
1 1 2 3 4
1 0 1 2 3
1 4 5 2 4
2 7 8 9 10
2 1 2 3 4
3 6 7 8 9
4 1 2 3 4
4 0 1 2 3
Desired output
ID count Axis
1 3 [1 2 3 4 , 0 1 2 3 , 4 5 2 4]
2 2 [ 7 8 9 10 , 1 2 3 4]
3 1 [6 7 8 9 ]
4 2 [1 2 3 4 , 0 1 2 3]
I know I'm supposed to use aggregate function, but I'm not getting it. If someone can guide me, I would really appreciate it
CodePudding user response:
Use:
df2 = df.groupby('ID').agg(lambda x: list(x))
df2['count'] = df2['Axis'].apply(lambda x: len(x))
print(df2)
which gives:
Axis count
ID
1 [[1, 2, 3, 4], [0, 1, 2, 3], [4, 5, 2, 4]] 3
2 [[7, 8, 9, 10], [1, 2, 3, 4]] 2
3 [[6, 7, 8, 9]] 1
4 [[1, 2, 3, 4], [0, 1, 2, 3]] 2
for the DataFrame
ID Axis
0 1 [1, 2, 3, 4]
1 1 [0, 1, 2, 3]
2 1 [4, 5, 2, 4]
3 2 [7, 8, 9, 10]
4 2 [1, 2, 3, 4]
5 3 [6, 7, 8, 9]
6 4 [1, 2, 3, 4]
7 4 [0, 1, 2, 3]
CodePudding user response:
out = df.groupby('ID')['Axis'].agg(['count', ('Axis', lambda x: list(x))])
out
ID count Axis
0 1 3 [1 2 3 4 , 0 1 2 3 , 4 5 2 4 ]
1 2 2 [7 8 9 10 , 1 2 3 4 ]
2 3 1 [6 7 8 9 ]
3 4 2 [1 2 3 4 , 0 1 2 3 ]
CodePudding user response:
Your dataframe can be obtained by:
df = pd.DataFrame(data=np.array([[1,"1 2 3 4"]
,[1, "0 1 2 3"]
,[1, "4 5 2 4"]
,[2, "1 2 3 4"]
,[ 2,"7 8 9 10"]
,[ 2, "1 2 3 4" ]
,[ 3, "6 7 8 9" ]
,[ 4, "1 2 3 4" ]
,[ 4, "0 1 2 3" ]
]),columns=['ID', 'Axis']).set_index('ID')
The following solution will get you the desired result:
df1 = pd.DataFrame()
df1["count"] = df.groupby("ID").count()
df1["Axis"] = df.groupby("ID").agg({"Axis": lambda x: list(x.unique())})
The result is:
ID count Axis
1 3 [1 2 3 4, 0 1 2 3, 4 5 2 4]
2 3 [1 2 3 4, 7 8 9 10]
3 1 [6 7 8 9]
4 2 [1 2 3 4, 0 1 2 3]