Home > front end >  How to increment header except the first column in Pandas
How to increment header except the first column in Pandas

Time:02-04

I have extracted a dataframe of n columns, where the first column is the index column with no header followed by pairs of "success" and "fail" columns. I managed to extract the success only columns and place a header on the index column with this code:

df2 = df1.iloc[:,0::2] 
df3 = df2
df3.reset_index(inplace=True)
df4 = df3.rename(columns = {'index':'out_date'})

df4

Output of the code can be found here

I would like to sort the "out_date" column in ascending order using sort_values but for that to work, the "success" columns need to be unique. I have this line of code that is able to rename the headers to "success1", "success2", "success3",..., but I can't figure out how to exclude the "out_date" column.

numcolumn = df4.shape[1]
df4.columns = ["success" str(x) for x in range(1,numcolumn 1)]

df4

Any help given will be appreciated. Thank you.

CodePudding user response:

I suggest set new columns names before converting index to out_date column with enumerate and f-strings:

df2 = df1.iloc[:,0::2] 
df2.columns = [f"success{i}" for i, x in enumerate(df2.columns, 1)]

df4 = df2.rename_axis('out_date').reset_index()

If need your solution is possible add first value like list:

df4.columns = df4.columns[:1].tolist()   ["success" str(x) for x in range(1,numcolumn)]

CodePudding user response:

I had posted a generic way for de-duplicating column names while leaving the first one untouched in this answer:

def suffix():
    yield ''
    i = 0
    while True:
        i  = 1
        yield f'_{i}'

def dedup(df):
    from collections import defaultdict
    d = defaultdict(suffix)
    df.columns  = df.columns.map(lambda x: x next(d[x]))
    
dedup(df)

example:

df = pd.DataFrame([range(7)], columns=['out_date'] ['success']*6)

#    out_date  success  success  success  success  success  success
# 0         0        1        2        3        4        5        6


dedup(df)

print(df)

output:

   out_date  success  success_1  success_2  success_3  success_4  success_5
0         0        1          2          3          4          5          6
  •  Tags:  
  • Related