I have a dataframe that contains a person
, year
, and a bunch of flag
containing columns, like below:
# sample dataframe
data = [["John Doe", 2018, True, False, True], ["Jane Doe", 2019, True, False, False]]
df = pd.DataFrame(data, columns=["person", "year", "flag_1", "flag_2", "flag_3"])
df
person year flag_1 flag_2 flag_3
0 John Doe 2018 True False True
1 Jane Doe 2019 True False False
I would like my final output to be a groupby
where the person
and year
are retained, and a new column containing a dictionary of all the flag results is stored.
Something kind of like, but not quite like, this:
First: reshape
reshaped_df = pd.melt(
df, id_vars=["person", "year"], value_vars=["flag_1", "flag_2", "flag_3"]
)
person year variable value
0 John Doe 2018 flag_1 True
1 Jane Doe 2019 flag_1 True
2 John Doe 2018 flag_2 False
3 Jane Doe 2019 flag_2 False
4 John Doe 2018 flag_3 True
Second: Create dictionary
reshaped_df.set_index(["person", "year", "variable"]).T.to_dict("list")
{('John Doe', 2018, 'flag_1'): [True],
('Jane Doe', 2019, 'flag_1'): [True],
('John Doe', 2018, 'flag_2'): [False],
('Jane Doe', 2019, 'flag_2'): [False],
('John Doe', 2018, 'flag_3'): [True],
('Jane Doe', 2019, 'flag_3'): [False]}
except I want my output to look like this:
person year flag_dict
0 John Doe 2018 {'flag_1': True, 'flag_2': False, 'flag_3': True}
1 Jane Doe 2019 {'flag_1': True, 'flag_2': False, 'flag_3': False}
Is this possible? If so how can it be done? Thank you!
CodePudding user response:
It is possible to reshape the dataframe and create a new column containing a dictionary of all the flag results as described. Here's a way to do it:
# Reshape the dataframe
reshaped_df = pd.melt(
df, id_vars=["person", "year"], value_vars=["flag_1", "flag_2", "flag_3"]
)
# Create a dictionary for each group of person, year
reshaped_df = reshaped_df.groupby(["person", "year"]).apply(lambda x: x.set_index("variable")["value"].to_dict()).reset_index()
# Rename the last column
reshaped_df.rename(columns={0: "flag_dict"}, inplace=True)
print(reshaped_df)
You should now have a dataframe that looks like this:
person year flag_dict
0 John Doe 2018 {'flag_1': True, 'flag_2': False, 'flag_3': True}
1 Jane Doe 2019 {'flag_1': True, 'flag_2': False, 'flag_3': False
}
Hope this helps!
CodePudding user response:
No need for a complex reshape, simply use to_dict
cols = list (df.filter(like='flag_'))
out = (
df.drop(columns=cols)
.assign(flag_dict=df[cols].to_dict('records'))
)
Output:
person year flag_dict
0 John Doe 2018 {'flag_1': True, 'flag_2': False, 'flag_3': True}
1 Jane Doe 2019 {'flag_1': True, 'flag_2': False, 'flag_3': False}