Here is my toy dataframe example that I am trying to pivot:
import pandas as pd
df = pd.DataFrame({'id': [0, 0, 0, 0, 0, 1, 1,1], 'key':['role', 'role', 'role', 'dep', 'country', 'role', 'dep', 'country'], 'val': ['admin', 'local_usr', 'fin_dep_ds', 'fin', 'US', 'kuku', 'security', 'DE']})
df.pivot_table(index="id", columns="val", aggfunc="size", ).reset_index()
But the output I get is:
val id DE US admin fin fin_dep_ds kuku local_usr security
0 0 NaN 1.0 1.0 1.0 1.0 NaN 1.0 NaN
1 1 1.0 NaN NaN NaN NaN 1.0 NaN 1.0
What I want is to transform it to be:
id admin local_usr fin_dep_ds kuku country dep
0 1 1 1 0 US fin
1 0 0 0 1 DE security
Please advise how can I pivot my df to get this result, it seems to me I need to split the df to 2 parts and join - the part with multiple values per key and the 1:1 key value pairs.
CodePudding user response:
You pivot it twice, based on whether the key was repeated:
# Find keys that are repeated more than once for any `id`
idx = df.groupby(["key", "id"]).size().groupby(level=0).max().loc[lambda x: x > 1].index
# We will pivot those keys differently
cond = df["key"].isin(idx)
result = pd.concat([
df[cond].pivot_table(index="id", columns="val", aggfunc="size", fill_value=0),
df[~cond].pivot_table(index="id", columns="key", aggfunc="first").droplevel(0, axis=1)
], axis=1).reset_index()
CodePudding user response:
Try:
data = {}
for _, row in df.iterrows():
if row["key"] in {"dep", "country"}:
data.setdefault(row["id"], {})[row["key"]] = row["val"]
else:
data.setdefault(row["id"], {})[row["val"]] = 1
data[row["id"]]["id"] = row["id"]
df = pd.DataFrame(data.values()).fillna(0).set_index("id")
df = df[sorted(df.columns, key=lambda k: k in {"dep", "country"})]
print(df)
Prints:
admin local_usr fin_dep_ds kuku dep country
id
0 1.0 1.0 1.0 0.0 fin US
1 0.0 0.0 0.0 1.0 security DE