I have a dataframe which has duplicate column names:
Accepted Accepted Accepted Reject Accepted Reject
ABC IJK JKL XYJ LMN UIO
BCD PQR EFG YVG GHIJ PLK
...and want to convert it into two dataframes; one only of "Accepted" columns and other for "Reject" Columns:
df1:
Accepted Accepted Accepted Accepted
ABC IJK JKL LMN
BCD PQR EFG GHIJ
df2:
Reject Reject
XYJ UIO
YVG PLK
Tried:
df1=df["Accepted"]
df2=df["Reject"]
... but this only gives the first column matching this name.
CodePudding user response:
If select one column with same name are selected all columns with same name in DataFrame:
df1 = df['Accepted']
df2 = df['Reject']
Then is possible deduplicate columns:
df1.columns = [f'{x}_{i}' for i, x in enumerate(df1.columns, 1)]
df2.columns = [f'{x}_{i}' for i, x in enumerate(df2.columns, 1)]
EDIT: If get only first column name it means there are not duplicated columns names, so is possible use DataFrame.filter
:
df1 = df.filter(like='Accepted')
df2 = df.filter(like='Reject')
CodePudding user response:
You can also use this
df1 = df.loc[:,df.columns.isin(['accepted'])]
OR
df2 = df.loc[:,df.columns.isin(['rejected'])]
CodePudding user response:
It's not recommended to have duplicate column names, but anyway:
df1 = df.loc[:, df.columns.get_loc('Accepted')]
Accepted Accepted Accepted Accepted
0 ABC IJK JKL LMN
1 BCD PQR EFG GHIJ
df2 = df.loc[:, df.columns.get_loc('Reject')]
CodePudding user response:
IMO give the same name to columns is not a good practice. use unique names for every column. it doesn't matter if you put a number or add a unique character into the name. just make it unique
Solution from me, rename the column first
df.columns = ['Accepted1','Accepted2','Accepted3','Reject1','Accepted4','Reject2']
df1 = df[['Accepted1','Accepted2','Accepted3','Accepted4']]
df2 = df[['Reject1','Reject2']]