Populating values not explicitly stated in dataframe-CodePudding

How can I add missing column values to a dataframe based off list of possible values.
the list of values looks like:

type = ['type1','type2','type3','type4']

Here's code for df:

import pandas as pd

data = {
    '1': ['fall',  'type2', 'MATH 1234', 'Yes'],
    '2': ['fall',  'type1', 'MATH 1234', 'Yes']
}
columns=['term', 'type', 'course', 'offered']

df = pd.DataFrame.from_dict(
    data=data, orient='index'
)
df.columns = columns

dataframe looks like:

    term    type    course  offered
1   fall    type2   MATH 1234   Yes
2   fall    type1   MATH 1234   Yes

desired output:

    term    type    course  offered
1   fall    type2   MATH 1234   Yes
2   fall    type1   MATH 1234   Yes
3   fall    type3   MATH 1234   NO
4   fall    type4   MATH 1234   NO

type3 and type4 are not offered for MATH 1234 in the fall therefore offered should be "NO".

Any ideas how to handle this?

CodePudding user response：

If I understand you correctly, you want to create all combinations of term, type, course and fill them with No if the same combination cannot be found in df.

If so, try this:

terms = df["term"].unique()
types = ["type1", "type2", "type3", "type4"]
courses = df["course"].unique()

result = (
    pd.MultiIndex.from_product([terms, types, courses], names=["term", "type", "course"])
    .to_frame(index=None)
    .merge(df, on=["term", "type", "course"], how="left")
    .fillna("No")
)

CodePudding user response：

The goal here is to get the missing rows for the combination of (term, course) with the new values for type. One option is with complete from pyjanitor to expose the missing rows:

# pip install pyjanitor
import pandas as pd
import janitor

df.complete(['term', 'course'], {'type': type}, fill_value = 'NO')

   term   type     course offered
0  fall  type1  MATH 1234     Yes
1  fall  type2  MATH 1234     Yes
2  fall  type3  MATH 1234      NO
3  fall  type4  MATH 1234      NO

complete takes a variable number of arguments - the dictionary allows us to pass in the new values for type.