How can I add missing column values to a dataframe based off list of possible values.
the list of values looks like:
type = ['type1','type2','type3','type4']
Here's code for df:
import pandas as pd
data = {
'1': ['fall', 'type2', 'MATH 1234', 'Yes'],
'2': ['fall', 'type1', 'MATH 1234', 'Yes']
}
columns=['term', 'type', 'course', 'offered']
df = pd.DataFrame.from_dict(
data=data, orient='index'
)
df.columns = columns
dataframe looks like:
term type course offered
1 fall type2 MATH 1234 Yes
2 fall type1 MATH 1234 Yes
desired output:
term type course offered
1 fall type2 MATH 1234 Yes
2 fall type1 MATH 1234 Yes
3 fall type3 MATH 1234 NO
4 fall type4 MATH 1234 NO
type3 and type4 are not offered for MATH 1234 in the fall therefore offered should be "NO".
Any ideas how to handle this?
CodePudding user response:
If I understand you correctly, you want to create all combinations of term
, type
, course
and fill them with No if the same combination cannot be found in df
.
If so, try this:
terms = df["term"].unique()
types = ["type1", "type2", "type3", "type4"]
courses = df["course"].unique()
result = (
pd.MultiIndex.from_product([terms, types, courses], names=["term", "type", "course"])
.to_frame(index=None)
.merge(df, on=["term", "type", "course"], how="left")
.fillna("No")
)
CodePudding user response:
The goal here is to get the missing rows for the combination of (term, course)
with the new values for type
. One option is with complete from pyjanitor to expose the missing rows:
# pip install pyjanitor
import pandas as pd
import janitor
df.complete(['term', 'course'], {'type': type}, fill_value = 'NO')
term type course offered
0 fall type1 MATH 1234 Yes
1 fall type2 MATH 1234 Yes
2 fall type3 MATH 1234 NO
3 fall type4 MATH 1234 NO
complete takes a variable number of arguments - the dictionary allows us to pass in the new values for type
.