Return position of columns with the same name in pandas-CodePudding

I would like to get the position of columns with the same name (that is column A).

DataFrame a:

A        B        A       C
text1    text3    text5   text7
text2    text4    text6   text8

I can get position of column A but how to get the position of the second column. There are multiple dataframe with different number of columns and position of A are not the same across the dataframes. Thank you.

for col in a.columns:   
        if col == 'A':
            indx1 = a.columns.get_loc(col)

        #if second column A 
            indx2 = a.columns.get_loc(col)

CodePudding user response：

Your result can be easily achieved using np.where().

df = pd.DataFrame(
    data=[["text1", "text2", "text5", "text7"], ["text2", "text4", "text6", "text8"]],
    columns=["A", "B", "A", "D"],
)
np.where(df.columns == "A")[0]

Output:

array([0, 2], dtype=int64)

CodePudding user response：

res = []
for index, col in enumerate(a.columns):
        if col == 'A':
            res.append(index)

print(res)

This will give you the position of all columns with the same name

CodePudding user response：

As a one liner, this returns the index positions of columns which are repeated:

indexes = [i for i, j in zip(range(len(df.columns)), df.columns) if j in df.loc[:, df.columns.value_counts() > 1].columns]

It returns: [0, 2] in this case because column A is repeated.

CodePudding user response：

if find 'A':

np.where(df.columns == 'A')[0]

result:

array([0, 2], dtype=int64)

if find all duplicated column name:

np.where(df.columns.duplicated(keep=False))[0]

result:

array([0, 2], dtype=int64)