I need to concat two CSVs with a different number of rows. I receive a warning when I do so and the results in print(df1.head())
return an empty dataframe when I know by visual inspection that there are matching rows.
Could the warning affect the results reflected in head()? Or is there another issue I need to resolve?
# columns to filter on
columns_to_match = ['column1', 'column2', 'column3', 'column4', 'column5']
# find matching rows based off columns in filter
filter = df1[columns_to_match].eq(df2[columns_to_match]).all(axis=1)
# concat filtered results
df1 = pd.concat([df1[columns_to_match], df2[columns_to_match]])
# print matching results
print(df1.head())
I receive the following error:
UserWarning: Boolean Series key will be reindexed to match DataFrame index.
df1 = pd.concat([df1[filter], df2[filter]])
Thank you.
CodePudding user response:
Firstly, you have multiple uses for filter
, which is confusing and error-prone. I believe that the warning arises because filter = df1[filter].eq(df2[filter]).all(axis=1)
is a series aligned to df1
. However you then try to subset df2
using the same mask series.
One way to fix it would be to have e.g. filter2 = df2[cols].eq(df1[cols]).all(axis=1)
and then do
df1 = pd.concat([df1[filter], df2[filter2]])
(note that I am defining cols = ['column1', 'column2', 'column3', 'column4', 'column5']
here.
Edit - this should work:
# find matching rows based off columns in filter
filter1 = df1[columns_to_match].eq(df2[columns_to_match]).all(axis=1)
filter2 = df2[columns_to_match].eq(df1[columns_to_match]).all(axis=1)
# concat filtered results
df1 = pd.concat([df1[filter1], df2[filter2]])
CodePudding user response:
Thank you to @ags29 and @not_speshal. Both of your suggestions led me to this solution.
# columns to filter on
columns_to_match = ['column1', 'column2', 'column3', 'column4', 'column5']
# merge filtered results)
df1 = pd.merge(df1, df2, on=columns_to_match)
print(df1.head())