Consider:
Name Location code ID Details Column5 Code
Kirsh HD12 76 Age:25 127.3 Yes
Kirsh HD12 76 Age:25 12758.56 Yes
Kirsh HD12 76 Age:25 No data found No
John HD12 87 Age:25; 76.45 Yes
John HD12 87 Age:25; No data found No
John HD12 87 Age:25; No
John HD12 87 Age:25; 16578.56 Yes
Ward HD12 87 Age:25; No
Ward HD12 87 Age:25; No
I want this as:
Name Location code ID Details Column5 Code
Kirsh HD12 76 Age:25 12962.31 Yes
John HD12 87 Age:25; 16655.01 No
Ward HD12 87 Age:25; No
CodePudding user response:
import pandas as pd
data = {'Name': ['Kirsh', 'Kirsh', 'Kirsh', 'John', 'John', 'John', 'John', 'Ward', 'Ward'],
'Location code': ['HD12', 'HD12', 'HD12', 'HD12', 'HD12', 'HD12', 'HD12', 'HD12', 'HD12'],
'ID': [76, 76, 76, 87, 87, 87, 87, 87, 87],
'Details': ['Age:25', 'Age:25', 'Age:25', 'Age:25;', 'Age:25;', 'Age:25;', 'Age:25;', 'Age:25;', 'Age:25;'],
'Column5': [127.3, 12758.56, 'No data found', 76.45, 'No data found', 'No', 16578.56, 'No', 'No'],
'Code': ['Yes', 'Yes', 'No', 'Yes', 'No', 'No', 'Yes', 'No', 'No']
}
df = pd.DataFrame(data)
# Sum the numerical values
summation = df.loc[(df['Column5'] != 'No data found') &
(df['Code'] != 'No'), 'Column5'].sum()
# Create a new column with the summation
df['Column5_Sum'] = summation
# Group the data by name and location code
grouped_data = df.groupby(['Name', 'Location code', 'ID', 'Details']).agg(
{'Column5': 'first', 'Code': 'first', 'Column5_Sum': 'first'}).reset_index()
# Drop the temporary column
grouped_data.drop(columns='Column5_Sum', inplace=True)
print(grouped_data)
CodePudding user response:
Here is the solution. Please replace those "No data found" with either 0 of nan:
import pandas as pd
import numpy as np
data = {
"Name": ["Kirsh", "Kirsh", "Kirsh", "John", "John", "John", "John", "Ward", "Ward"],
"Location": ["HD12", "HD12", "HD12","HD12", "HD12", "HD12", "HD12","HD12", "HD12"],
"Code": [76, 76, 76, 87, 87, 87,87, 87,87],
"Id": ["Age:25", "Age:25", "Age:25", "Age:25", "Age:25", "Age:25", "Age:25", "Age:25", "Age:25"],
"Details":[127.3, 12758.56, np.nan, 76.45, np.nan, np.nan, 16578.56, np.nan, np.nan],
"Column5": ["Yes", "Yes", "No", "Yes", "No", "No", "Yes", "No", "No"]
}
df = pd.DataFrame(data)
df['Details'] = df.groupby(['Name'])['Details'].transform('sum')
gk = df.groupby("Name")
gk.first()
Please see https://www.geeksforgeeks.org/python-pandas-dataframe-groupby/ for groupby information
Also, Please see the comment of @John Stud, he already answered the best solution.