I have a table below , I will like to group and concatenate into a new field based on the siteid using pandas/python
<!DOCTYPE html>
<html>
<style>
table, th, td {
border:1px solid black;
}
</style>
<body>
<table style="width:100%">
<tr>
<th>SiteID</th>
<th>Name</th>
<th>Count</th>
</tr>
<tr>
<td>A</td>
<td>Conserve</td>
<td>3</td>
</tr>
<tr>
<td>A</td>
<td>Listed</td>
<td>5</td>
</tr>
<tr>
<td>B</td>
<td>Listed</td>
<td>5</td>
</tr>
</table>
</body>
</html>
I will like the new table to look like this
<!DOCTYPE html>
<html>
<style>
table, th, td {
border:1px solid black;
}
</style>
<body>
<table style="width:100%">
<tr>
<th>SiteID</th>
<th>Output</th>
</tr>
<tr>
<td>A</td>
<td>There are Conserve : 3, Listed : 5 </td>
</tr>
<tr>
<td>B</td>
<td>There are Listed : 5</td>
</tr>
</table>
</body>
</html>
not sure what code to use, I have used group by. I tried this
df = df.groupby("SiteID")["Name"].agg(";".join).reset_index()
but I would like to put the result in a new field with a concatenate string as above
CodePudding user response:
You can use a custom groupby.agg
:
out = (
(df['Name'] ': ' df['Count'].astype(str))
.groupby(df['SiteID']).agg(', '.join)
.reset_index(name='Output')
)
output:
SiteID Output
0 A Conserve: 3, Listed: 5
1 B Listed: 5
If you need the leading "There are":
df['Output'] = 'There are ' df['Output']
CodePudding user response:
Here is how you can achieve this:
res = (
df
.assign(Output=df[['Name', 'Count']]
.astype(str)
.apply(': '.join, axis=1)
)
.groupby('SiteID',as_index=False)['Output']
.apply(lambda x: f"There are {', '.join(x)}")
)
print(res)
SiteID Output
0 A There are Conserve: 3, Listed: 5
1 B There are Listed: 5