I'm struggling on finding a solution for this situation.
I have two pandas dataframes one have columns named with their ids and the other one has the correlation between the ids, real name and sector:
df1 =
id1 | id2 | id3 | randomcol | id4 | id5 | id6 | id7 | ... |
---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | a | 4 | 5 | 6 | 7 | ... |
7 | 6 | 5 | b | 4 | 3 | 2 | 1 | ... |
1 | 2 | 3 | c | 4 | 5 | 6 | 7 | ... |
And the correlation between ids, real names and sectors:
df2 =
real names | id | sector |
---|---|---|
air | 1 | 5 |
earth | 2 | 10 |
fire | 3 | 15 |
water | 4 | 20 |
wind | 5 | 25 |
I want to rename the df1 columns with their real names from df2.
I can do df1.rename(columns={"id1": "air", "id2": "earth", "id3": "fire",...}, errors="raise")
But it have around 20 different sectors so it would be a bit frustating.
Is there a better way of doing this?
Thanks in advance.
CodePudding user response:
You can create a mapping dict (Series) from df2
and rename columns of df1
. The trick here is to convert numeric ids (1, 2, 3, ...) to their equivalent string (id1, id2, id3, ...):
MAPPING = df2.assign(id='id' df2['id'].astype(str)).set_index('id')['real names']
df1 = df1.rename(columns=MAPPING)
Output:
>>> df1
air earth fire randomcol water wind id6 id7
0 1 2 3 a 4 5 6 7
1 7 6 5 b 4 3 2 1
2 1 2 3 c 4 5 6 7
>>> MAPPING
id
id1 air
id2 earth
id3 fire
id4 water
id5 wind
Name: real names, dtype: object
CodePudding user response:
You can make a mapping given df2["id"]
and df2["real names"]
and then replace df.columns
:
map = {f"id{i}": name for i, name in zip(df2["id"], df2["real names"])}
df1.columns = df1.columns.to_series().replace(map)
Output:
air earth fire randomcol water wind id6 id7
0 1 2 3 a 4 5 6 7
1 7 6 5 b 4 3 2 1
2 1 2 3 c 4 5 6 7