I have two dataframe df1 =
C | L | O | D | LN | M | x | y | z | w |
---|---|---|---|---|---|---|---|---|---|
FR | 312 | 73 | 2021-04-09 | 1 | W | 1 | 0 | 0 | 0 |
FR | 312 | 73 | 2021-04-09 | 1 | W | 0 | 1 | 0 | 0 |
FR | 312 | 73 | 2021-04-09 | 1 | W | 0 | 0 | 1 | 0 |
FR | 312 | 73 | 2021-04-09 | 3 | W | 1 | 0 | 0 | 0 |
FR | 312 | 73 | 2021-04-09 | 3 | W | 0 | 1 | 0 | 0 |
FR | 312 | 73 | 2021-04-09 | 3 | W | 0 | 0 | 1 | 0 |
FR | 312 | 73 | 2021-04-09 | 4 | W | 1 | 0 | 0 | 0 |
FR | 312 | 73 | 2021-04-09 | 4 | W | 0 | 1 | 0 | 0 |
FR | 312 | 73 | 2021-04-09 | 4 | W | 0 | 0 | 1 | 0 |
FR | 312 | 73 | 2021-04-09 | 5 | W | 1 | 0 | 0 | 0 |
FR | 312 | 73 | 2021-04-09 | 5 | W | 0 | 1 | 0 | 0 |
df2 =
C | L | O | D | LN | M | x | y | z | w |
---|---|---|---|---|---|---|---|---|---|
FR | 312 | 73 | 2021-04-09 | 1 | L | 1 | 0 | 0 | 1 |
I would like to concatenate or merge those 2 df on C | L | O | D | LN
output desired =
C | L | O | D | LN | M | x | y | z | w |
---|---|---|---|---|---|---|---|---|---|
FR | 312 | 73 | 2021-04-09 | 1 | W | 1 | 0 | 0 | 0 |
FR | 312 | 73 | 2021-04-09 | 1 | W | 0 | 1 | 0 | 0 |
FR | 312 | 73 | 2021-04-09 | 1 | W | 0 | 0 | 1 | 0 |
FR | 312 | 73 | 2021-04-09 | 1 | L | 0 | 0 | 0 | 1 |
thanks for your help
CodePudding user response:
You could set up the indexes such that they align with the data you want to match:
df1.set_index(['C','L','O','D','LN'], inplace=True)
df2.set_index(['C','L','O','D','LN'], inplace=True)
then you could merge both dfs into a single df containing all records
df_concat = pd.concat([df1, df2], axis=0)
All thats left is to find the indices that exist in both dataframes, and use it on the concatenated df:
i2keep = set(df1.index.tolist()) and set(df2.index.tolist())
df_concat.loc[i2keep,:].reset_index()
yields
C L O D LN M x y z w
0 FR 312 73 4/9/2021 1 W 1 0 0 0
1 FR 312 73 4/9/2021 1 W 0 1 0 0
2 FR 312 73 4/9/2021 1 W 0 0 1 0
3 FR 312 73 4/9/2021 1 L 1 0 0 1
CodePudding user response:
As I was trying to point out in the comments, a possible solution is:
- duplicate
df2
, and drop columns that are not wanted (for merging) - use inner merge on
df2_dupli
anddf1
- possibly remove
Nan
rows, not sure if this step is required - combine
df2
anddf_merge
(concat) - profit
# initialize
df1 = pd.DataFrame(data={'A': [1, 2, 3], 'B': ["abc", "abc", "cde"], 'C': [0, 1, 0]})
df2 = pd.DataFrame(data={'A': [1, 2], 'B': ["abc", "abc"], 'C': [5, 5]})
# duplicate and drop
df2_dupli = df2.copy()
df2_dupli = df2_dupli.drop(columns=['C'])
# inner merge
df_merge = pd.merge(df1, df2_dupli, how='inner', left_on=['A','B'], right_on = ['A','B'])
# combine
df_out = pd.concat([df_merge, df2])
>>> df1
A B C
0 1 abc 0
1 2 abc 1
2 3 cde 0
>>> df2
A B C
0 1 abc 5
1 2 abc 5
>>> df_out
A B C
0 1 abc 0
1 2 abc 1
0 1 abc 5
1 2 abc 5
df_out
now contains all rows from df2
, and rows from df1
that have the same values in specified columns as df2
.