Consider this example:
import polars as pl
df = pl.DataFrame({
'ID': ['0', '1', '2', '3', '4', '5','6', '7', '8', '9', '10'],
'Name' : ['A','','','','B','','C','','','D', ''],
'Element' : ['', '4', '4', '0', '', '4', '', '0', '9', '', '6']
})
The 'Name' is linked to an 'ID'. This ID is used as a value in the 'Element' column. How do I map the correct 'Name' to the elements? Also I want to group the elements by 'Name' ('Name_list'), count them and sort by counted values ('E_count').
The resulting df would be:
Name_list Element E_count
-------------------------
'B' '4' 3
'A' '0' 2
'C' '6' 1
'D' '9' 1
Feedback very much appreciated; even a Pandas solution.
CodePudding user response:
If I understood your problem correctly, then you could use pandas and do the following:
countdf = pd.merge(df,df[['ID','Name']],left_on='Element',right_on='ID',how='inner')
countdf = pd.DataFrame(countdf.groupby('Name_y')['Element'].count())
result = pd.merge(countdf,df[['Name','ID']],left_on='Name_y',right_on='Name',how='left')
result[['Name','ID','Element']]
CodePudding user response:
Here's a Polars solution. We'll use a join
to link the ID
and Element
columns (after some filtering and summarizing).
(
df.select(["Name", "ID"])
.filter(pl.col("Name") != "")
.join(
df.select(["Element"]).groupby("Element").agg(
pl.count().alias("E_count")),
left_on="ID",
right_on="Element",
how="left",
)
.sort('E_count', reverse=True)
.rename({"Name":"Name_list", "ID":"Element"})
)
Note: this differs from the solution listed in your answer. The Name D
is associated with ID 9
(not 10).
shape: (4, 3)
┌───────────┬─────────┬─────────┐
│ Name_list ┆ Element ┆ E_count │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ u32 │
╞═══════════╪═════════╪═════════╡
│ B ┆ 4 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ A ┆ 0 ┆ 2 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ C ┆ 6 ┆ 1 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ D ┆ 9 ┆ 1 │
└───────────┴─────────┴─────────┘
You can also use the polars.Series.value_counts method, which looks somewhat cleaner:
(
df.select(["Name", "ID"])
.filter(pl.col("Name") != "")
.join(
df.get_column("Element").value_counts(),
left_on="ID",
right_on="Element",
how="left",
)
.sort("counts", reverse=True)
.rename({"Name": "Name_list", "ID": "Element", "counts": "E_count"})
)
CodePudding user response:
using pandas We can use map to map the values and using the where condition to keep from making name as null. lastly, its a groupby
df['Name'] = df['Name'].where(cond=df['Element']=="",
other=df[df['Element']!=""]['Element'].map(lambda x: df[df['ID'] == x]['Name'].tolist()[0]),
axis=0)
df[df['Element'] != ""].groupby(['Name','Element']).count().reset_index()
Name Element ID
0 A 0 2
1 B 4 3
2 C 6 1
3 D 9 1