python Count Column-CodePudding

I'm trying to create new column with the count of each attribute (Number) but it's given me the total count of row as shown below

POM = pd.read_excel('POM.xlsx' , sheet_name='Sheet1')
POM['Merged']=POM['ACD ID'].astype(str) PG['Date'].astype(str)
POM['Count']=POM['Merged'].count()
print(POM.head(2))
POM.to_excel('Test.xlsx' , index=False)

Result

        Date  ACD ID           Merged  Count
0 2022-03-08   14004  140042022-03-08    986
1 2022-03-09   14004  140042022-03-09    986

How can I count the unique code which is repetition found in column (Merged)?

# Date              ACD ID     Merged     Count
2022-03-08 00:00:00 14004   140042022-03-08 10
2022-03-09 00:00:00 14004   140042022-03-09 49
2022-03-10 00:00:00 14004   140042022-03-10 62

CodePudding user response：

Try groupby() and then transform("count"):

import pandas

df = pandas.DataFrame({"date": ["2022", "2023", "2023", "2024"],
                       "id": ["a", "a", "a", "c"]})
df["merged"] = df["date"]   "_"   df["id"]
print(df)
>    date id  merged
  0  2022  a  2022_a
  1  2023  a  2023_a
  2  2023  a  2023_a
  3  2024  c  2024_c

df["counts"] = df.groupby(["merged"])["merged"].transform("count")
print(df)
>    date id  merged  counts
  0  2022  a  2022_a       1
  1  2023  a  2023_a       2
  2  2023  a  2023_a       2
  3  2024  c  2024_c       1

Check out this thread.

CodePudding user response：

You could do something like this:

df['Count'] = df['Merged'].copy().replace(df['Merged'].value_counts().to_dict())

You create a copy of Merged, then count the values in Merged and convert it to a dict of {id: count}, then use that dictionary to replace the id values in the copied Merged column.

This has the added benefit over groupby that you can retain your original indexing and shape of the dataframe while still filling in the counts of each id.

Ex:

df = pd.DataFrame({"date": ["2022", "2023", "2023", "2024"],
                       "id": ["a", "a", "a", "c"]})
df["merged"] = df["date"]   "_"   df["id"]
df['Count'] = df['merged'].copy().replace(df['merged'].value_counts().to_dict())

   date id  merged  Count
0  2022  a  2022_a      1
1  2023  a  2023_a      2
2  2023  a  2023_a      2
3  2024  c  2024_c      1