Home > Mobile >  Rename columns based on relatation between other columns
Rename columns based on relatation between other columns

Time:06-07

I'm struggling on finding a solution for this situation.

I have two pandas dataframes one have columns named with their ids and the other one has the correlation between the ids, real name and sector:

df1 =

id1 id2 id3 randomcol id4 id5 id6 id7 ...
1 2 3 a 4 5 6 7 ...
7 6 5 b 4 3 2 1 ...
1 2 3 c 4 5 6 7 ...

And the correlation between ids, real names and sectors:

df2 =

real names id sector
air 1 5
earth 2 10
fire 3 15
water 4 20
wind 5 25

I want to rename the df1 columns with their real names from df2.

I can do df1.rename(columns={"id1": "air", "id2": "earth", "id3": "fire",...}, errors="raise")

But it have around 20 different sectors so it would be a bit frustating.

Is there a better way of doing this?

Thanks in advance.

CodePudding user response:

You can create a mapping dict (Series) from df2 and rename columns of df1. The trick here is to convert numeric ids (1, 2, 3, ...) to their equivalent string (id1, id2, id3, ...):

MAPPING = df2.assign(id='id' df2['id'].astype(str)).set_index('id')['real names']
df1 = df1.rename(columns=MAPPING)

Output:

>>> df1
   air  earth  fire randomcol  water  wind  id6  id7
0    1      2     3         a      4     5    6    7
1    7      6     5         b      4     3    2    1
2    1      2     3         c      4     5    6    7

>>> MAPPING
id
id1      air
id2    earth
id3     fire
id4    water
id5     wind
Name: real names, dtype: object

CodePudding user response:

You can make a mapping given df2["id"] and df2["real names"] and then replace df.columns:

map = {f"id{i}": name for i, name in zip(df2["id"], df2["real names"])}
df1.columns = df1.columns.to_series().replace(map)

Output:

   air  earth  fire  randomcol  water  wind  id6  id7
0  1    2      3     a          4      5     6    7
1  7    6      5     b          4      3     2    1
2  1    2      3     c          4      5     6    7
  • Related