I have two dataframes with similar columns:
df1 = (a, b, c, d)
df2 = (a, b, c, d)
I want concat
or merge
some columns of them like below in df3
df3 = (a_1, a_2, b_1, b_2)
How can I put them beside as they are (without any change), and how can I merge them on a similar key like d
? I tried to add them to a list and concat them but don't know how to give them a new name. I don't want multi-level column names.
for ii, tdf in enumerate(mydfs):
tdf = tdf.sort_values(by="fid", ascending=False)
for _col in ["fid", "pred_text1"]:
new_col = _col str(ii)
dfs.append(tdf[_col])
ii = 1
df = pd.concat(dfs, axis=1)
CodePudding user response:
Without having a look at your dataframe, it would not be easy, but I am generating a dataframe to give you samples and insight into how the code works:
import pandas as pd
import re
df1 = pd.DataFrame({"a":[1,2,4], "b":[2,4,5], "c":[5,6,7], "d":[1,2,3]})
df2 = pd.DataFrame({"a":[6,7,5], "b":[3,4,8], "c":[6,3,9], "d":[1,2,3]})
mergedDf = df1.merge(df2, how="left", on="d").rename(columns=lambda x: re.sub("(. )\_x", r"\1_1", x)).rename(columns=lambda x: re.sub("(. )\_y", r"\1_2", x))
mergedDf
which results in:
a_1 | b_1 | c_1 | d | a_2 | b_2 | c_2 | |
---|---|---|---|---|---|---|---|
0 | 1 | 2 | 5 | 1 | 6 | 3 | 6 |
1 | 2 | 4 | 6 | 2 | 7 | 4 | 3 |
2 | 4 | 5 | 7 | 3 | 5 | 8 | 9 |
If you are interested in dropping other columns you can use code below:
mergedDf.iloc[:, ~mergedDf.columns.str.startswith("c")]
which results in:
a_1 | b_1 | d | a_2 | b_2 | |
---|---|---|---|---|---|
0 | 1 | 2 | 1 | 6 | 3 |
1 | 2 | 4 | 2 | 7 | 4 |
2 | 4 | 5 | 3 | 5 | 8 |