I'm trying to create new column with the count of each attribute (Number) but it's given me the total count of row as shown below
POM = pd.read_excel('POM.xlsx' , sheet_name='Sheet1')
POM['Merged']=POM['ACD ID'].astype(str) PG['Date'].astype(str)
POM['Count']=POM['Merged'].count()
print(POM.head(2))
POM.to_excel('Test.xlsx' , index=False)
Result
Date ACD ID Merged Count
0 2022-03-08 14004 140042022-03-08 986
1 2022-03-09 14004 140042022-03-09 986
How can I count the unique code which is repetition found in column (Merged)?
# Date ACD ID Merged Count
2022-03-08 00:00:00 14004 140042022-03-08 10
2022-03-09 00:00:00 14004 140042022-03-09 49
2022-03-10 00:00:00 14004 140042022-03-10 62
CodePudding user response:
Try groupby()
and then transform("count")
:
import pandas
df = pandas.DataFrame({"date": ["2022", "2023", "2023", "2024"],
"id": ["a", "a", "a", "c"]})
df["merged"] = df["date"] "_" df["id"]
print(df)
> date id merged
0 2022 a 2022_a
1 2023 a 2023_a
2 2023 a 2023_a
3 2024 c 2024_c
df["counts"] = df.groupby(["merged"])["merged"].transform("count")
print(df)
> date id merged counts
0 2022 a 2022_a 1
1 2023 a 2023_a 2
2 2023 a 2023_a 2
3 2024 c 2024_c 1
Check out this thread.
CodePudding user response:
You could do something like this:
df['Count'] = df['Merged'].copy().replace(df['Merged'].value_counts().to_dict())
You create a copy of Merged, then count the values in Merged and convert it to a dict of {id: count}, then use that dictionary to replace the id values in the copied Merged column.
This has the added benefit over groupby
that you can retain your original indexing and shape of the dataframe while still filling in the counts of each id.
Ex:
df = pd.DataFrame({"date": ["2022", "2023", "2023", "2024"],
"id": ["a", "a", "a", "c"]})
df["merged"] = df["date"] "_" df["id"]
df['Count'] = df['merged'].copy().replace(df['merged'].value_counts().to_dict())
date id merged Count
0 2022 a 2022_a 1
1 2023 a 2023_a 2
2 2023 a 2023_a 2
3 2024 c 2024_c 1