Home > database >  Find duplicate rows in one column and print duplicated rows to a new dataframe table as a group usin
Find duplicate rows in one column and print duplicated rows to a new dataframe table as a group usin

Time:12-14

My objective is to get the duplicated groups of column A and print/extract them into a new dataframe, ultimately to print each new dataframe into csv.

my current dataframe:

column A column B
A 2
A 2
A 3
B 2
B 3
B 4
C 2
C 2
D 2
D 2
D 3

desired output:

column A column B
A 2
A 2
A 3
column A column B
B 2
B 3
column A column B
C 2
C 2
column A column B
D 2
D 2
D 3

CodePudding user response:

You can loop over the unique values of column A and can diplay the data with specific value of column A

Code:

[df[df['ColA']==i] for i in set(df.ColA.values)]

Output;

[  ColA  ColB
 0    A     2
 1    A     2
 2    A     3,
   ColA  ColB
 6    C     2
 7    C     2,
   ColA  ColB
 3    B     2
 4    B     3
 5    B     4,
    ColA  ColB
 8     D     2
 9     D     2
 10    D     3]

CodePudding user response:

g = df.groupby('column A')
dup_chk = df.loc[df['column A'].eq('A'), 'column B']
out = [g.get_group(x)[lambda x: x['column B'].isin(dup_chk)] for x in g.groups]

out(list of dataframes)

[  column A  column B
 0        A         2
 1        A         2
 2        A         3,
   column A  column B
 3        B         2
 4        B         3,
   column A  column B
 6        C         2
 7        C         2,
    column A  column B
 8         D         2
 9         D         2
 10        D         3]

CodePudding user response:

Use groupby function to group each repeated elements in a row use for loop to loop through each group

grouped_df = df.groupby('column A')
for group in grouped_df:
    print(group)
  • Related