Home > Back-end >  Mapping column from one dataframe to the column of a different one and replacing it with values?
Mapping column from one dataframe to the column of a different one and replacing it with values?

Time:09-17

I try to solve the following problem:
I have these two dataframes, df1:

label term
A01.12 "Culture de légumes, de melons, de racines et de tubercules"
A01.14 Culture de la canne à sucre

and df2:

company organization_activity
coca-cola "Culture de légumes, de melons, de racines et de tubercules"
rum-factory Culture de la canne à sucre

I now would like to map the "term" of df1 to the "organization_activity" of df2 and create a new column with the name "codes" in df2 with the respective label in df1.

The resulting dataframe should look like this:

organization_activity codes
"Culture de légumes, de melons, de racines et de tubercules" A01.12
Culture de la canne à sucre A01.14

This partially works by using:

df2['codes'] = df2['organization_activity'].map(df1.set_index('term')['label'].to_dict())

Oddly the new column is created and filled but for a lot of terms the mapping does not work and a Nan value is inserted even though the "organization_activity" I am mapping exists in the "term" column. Does anyone have an idea on why this could happen? The formatting for the values in "term" and "organization_activity" is equal and there doesn't seem to be a difference in the format of values that are mappable and ones that are not. Is there maybe a different way to do this?
The dataframe I showed is just an abstract so there are multiple companies that can have the same "organization_activity".

CodePudding user response:

Normally, that is what you call a merge. Assuming you have the following dataframes:

df1 = pd.DataFrame({'Label': ['A01.12', 'A01.14'],
                    'term': ['Culture de légumes, de melons, de racines et de tubercules',
                             'Culture de la canne à sucre']})


df2 = pd.DataFrame({'organization_activity': [
                    'Culture de légumes, de melons, de racines et de tubercules',
                    'Culture de la canne à sucre']})

>>> df1
    Label                                               term
0  A01.12  Culture de légumes, de melons, de racines et d...
1  A01.14                        Culture de la canne à sucre

>>> df2
                               organization_activity
0  Culture de légumes, de melons, de racines et d...
1                        Culture de la canne à sucre

This is what a merge would give you after some cleaning:

>>> df3 = df2.merge(df1, left_on="organization_activity", right_on="term")
>>> df3.drop("term", axis=1).rename(columns={"Label": "Code"})
                               organization_activity    Code
0  Culture de légumes, de melons, de racines et d...  A01.12
1                        Culture de la canne à sucre  A01.14

I also assume that there are some more columns in df2, too, which should also correctly be mapped to Labels/Codes with a merge.

CodePudding user response:

Consider below dataframes from the question:

df1 = pd.DataFrame({'label': ['A01.12', 'A01.14'],
                    'term': ['"Culture de légumes, de melons, de racines et de tubercules"', 'Culture de la canne à sucre']})
df2 = pd.DataFrame({'company': ['coca-cola', 'rum-factory'],
                    'organization_activity': ['"Culture de légumes, de melons, de racines et de tubercules"', ' Culture de la  canne à sucre']})

print(df1)
    label                                                          term
0  A01.12  "Culture de légumes, de melons, de racines et de tubercules"
1  A01.14                                   Culture de la canne à sucre

print(df2)
       company                                         organization_activity
0    coca-cola  "Culture de légumes, de melons, de racines et de tubercules"
1  rum-factory                                  Culture de la  canne à sucre

Actually there are some white spaces in the above dataframe which are barely visible. I suggest you do some data cleaning first. For example, to replace double spaces by single space, and to remove white spaces in front and behind the stings.

for df in [df1, df2]:
    for col in df.columns:
        df[col] = df[col].str.strip().str.replace('  ', ' ')
        
df2['codes'] = df2['organization_activity'].map(df1.set_index('term')['label'].to_dict())
print(df2)

Output:

       company                                         organization_activity   codes
0    coca-cola  "Culture de légumes, de melons, de racines et de tubercules"  A01.12
1  rum-factory                                   Culture de la canne à sucre  A01.14

Alternatively you could use pd.merge() to merge the 2 dataframes:

df3 = pd.merge(df2, df1, left_on='organization_activity', right_on='term').drop(columns=['term'])
df3.columns = ['company', 'organization_activity', 'codes']
print(df3)

       company                                         organization_activity   codes
0    coca-cola  "Culture de légumes, de melons, de racines et de tubercules"  A01.12
1  rum-factory                                   Culture de la canne à sucre  A01.14
  • Related