Home > Software engineering >  Pivot dataframe with 1 to many key value pairs along with 1:1 key value pairs?
Pivot dataframe with 1 to many key value pairs along with 1:1 key value pairs?

Time:04-18

Here is my toy dataframe example that I am trying to pivot:

import pandas as pd

df = pd.DataFrame({'id': [0, 0, 0, 0, 0, 1, 1,1], 'key':['role', 'role', 'role', 'dep', 'country', 'role', 'dep', 'country'], 'val': ['admin', 'local_usr', 'fin_dep_ds', 'fin', 'US', 'kuku', 'security', 'DE']})
df.pivot_table(index="id", columns="val", aggfunc="size", ).reset_index()

But the output I get is:

    val id  DE  US  admin   fin fin_dep_ds  kuku    local_usr   security
0       0   NaN 1.0 1.0     1.0 1.0         NaN     1.0       NaN
1       1   1.0 NaN NaN     NaN NaN         1.0     NaN       1.0

What I want is to transform it to be:

id admin local_usr fin_dep_ds kuku country dep
0  1     1         1          0    US      fin
1  0     0         0          1    DE      security

Please advise how can I pivot my df to get this result, it seems to me I need to split the df to 2 parts and join - the part with multiple values per key and the 1:1 key value pairs.

CodePudding user response:

You pivot it twice, based on whether the key was repeated:

# Find keys that are repeated more than once for any `id`
idx = df.groupby(["key", "id"]).size().groupby(level=0).max().loc[lambda x: x > 1].index

# We will pivot those keys differently
cond = df["key"].isin(idx)

result = pd.concat([
    df[cond].pivot_table(index="id", columns="val", aggfunc="size", fill_value=0),
    df[~cond].pivot_table(index="id", columns="key", aggfunc="first").droplevel(0, axis=1)
], axis=1).reset_index()

CodePudding user response:

Try:

data = {}
for _, row in df.iterrows():
    if row["key"] in {"dep", "country"}:
        data.setdefault(row["id"], {})[row["key"]] = row["val"]
    else:
        data.setdefault(row["id"], {})[row["val"]] = 1
    data[row["id"]]["id"] = row["id"]

df = pd.DataFrame(data.values()).fillna(0).set_index("id")
df = df[sorted(df.columns, key=lambda k: k in {"dep", "country"})]
print(df)

Prints:

    admin  local_usr  fin_dep_ds  kuku       dep country
id                                                      
0     1.0        1.0         1.0   0.0       fin      US
1     0.0        0.0         0.0   1.0  security      DE
  • Related