I am trying to concat two pandas dataframe's and am running into IndexError. Here's some mock data:
import pandas as pd
df1 = pd.DataFrame({'col1': [1,2,3],
'col2': [4,5,6]
})
df2 = pd.DataFrame({'col1': [7,8,9],
'col2': ['10','11','12'],
'col3': ['13','14','15']
})
# Concat and keep only cols from df1
df3 = pd.concat([df1, df2], ignore_index=True).reindex(df1.columns, axis='columns')
Expected output:
df3
col1 col2
1 4
2 5
3 6
7 10
8 11
9 12
Full Traceback:
/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
3440
3441 if not self._index_as_unique:
-> 3442 raise InvalidIndexError(self._requires_unique_msg)
3443
3444 if not self._should_compare(target) and not is_interval_dtype(self.dtype):
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
CodePudding user response:
For me working correct with sample data.
I try change data for raise error, reason is duplicated columns names:
df1 = pd.DataFrame({'col1': [1,2,3],
'col2': [4,5,6]
}).rename(columns={'col2':'col1'})
print (df1)
col1 col1 <- col1 is duplicated
0 1 4
1 2 5
2 3 6
df2 = pd.DataFrame({'col1': [7,8,9],
'col2': ['10','11','12'],
'col3': ['13','14','15']
})
# Concat and keep only cols from df1
df3 = pd.concat([df1, df2], ignore_index=True).reindex(df1.columns, axis='columns')
print (df3)
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
You can find them:
print (df1.columns[df1.columns.duplicated(keep=False)])
Index(['col1', 'col1'], dtype='object')
print (df2.columns[df2.columns.duplicated(keep=False)])
Index([], dtype='object')
Solution is deduplicated them:
print (pd.io.parsers.ParserBase({'names':df1.columns})._maybe_dedup_names(df1.columns))
['col1', 'col1.1']