Home > Mobile >  Mapping dictionary to pandas dataframe with lists
Mapping dictionary to pandas dataframe with lists

Time:07-16

I have a dictionary where I would like to map it to a large pandas dataframe. The issue is the column I would like to use to map is wrapped in double quotes and sometimes there are one or more items in that column.

original

dict_id = {
   'College1': ['1256511'],
   'College2': ['1200582'],
   'College3': ['1256618'], 
   'College10': ['1256621']
}


   id1  id2            college_name
0   01   01    "College1, College2"
1   01   02  "College10, College12"
2   01   03             "College19"

desired

id1 id2    college_name              id_college
01   01     "College1, College2"    1256511, 1200582
01   02     "College10, College12"  1256621
01   03     "College19"

CodePudding user response:

Your data is better formatted imo after explode, but I put it all back to how it was at the end~

df.college_name = df.college_name.str[1:-1].str.split(', ')
df = df.explode('college_name')

df2 = pd.DataFrame.from_dict(dict_id, 'index', columns=['id_college'], dtype=str)

df = df.merge(df2, left_on='college_name', right_index=True, how='left')

df = df.fillna('').groupby(['id1', 'id2'], as_index=False).agg(', '.join)

df.college_name = '"'   df.college_name   '"'

print(df)

Output:

  id1 id2            college_name        id_college
0  01  01    "College1, College2"  1256511, 1200582
1  01  02  "College10, College12"         1256621,
2  01  03             "College19"

CodePudding user response:

let DF1 be your dictionary of college names and id's, and DF2 the massive dataframe with college name sometimes being a comma delimited list of college names

you're going to want to set the new column in DF2 according to a function that generates a series based on your DF1, and the DF2 collegeNames column

def genIds(df, df_col):
    id_list = []
    for collegeName in df_col:
        id_to_add = ""
        if ',' in collegeName:
            temp_list = []
            for cName in collegeName.split(','):
                if cName in df.keys(): 
                    # if this is an actual pandas df do 
                    # if cName in df['college_names']
                    temp_list.append(df[cName])
            id_to_add = ",".join(temp_list) if len(id_list)>0 else ""
        else:
            id_to_add = df[collegeName] if collegeName in df.keys() else ""
        id_list.append(id_to_add)
    return id_list
               

df2['ids'] = genIds(df1, df2['college_name'].values)
  • Related