Home > Enterprise >  How to assign increment values to pandas column names?
How to assign increment values to pandas column names?

Time:07-24

For any columns without column names, I want to arbitrarily assign increment numbers to each column name. Meaning if column name is NaN, assign 1, 2, 3...If column name exists, ignore. Here, column 28 onwards do not have column names.

My code below did not change the column names.

import pandas as pd
import numpy as np

# Arbitrarily assign the NaN column names with numbers (i.e., column 28 onwards)
df.iloc[:, 27:].columns = range(1, df.iloc[:, 27:].shape[1]   1)
df.columns

Original column names

df.columns

Index([               'strand',                 'start',
                        'stop',          'total_probes',
             'gene_assignment',       'mrna_assignment',
                   'swissprot',               'unigene',
       'GO_biological_process', 'GO_cellular_component',
       'GO_molecular_function',               'pathway',
             'protein_domains',         'crosshyb_type',
                    'category',               'seqname',
                  'Gene Title',              'Cytoband',
                 'Entrez Gene',            'Swiss-Prot',
                     'UniGene', 'GO Biological Process',
       'GO Cellular Component', 'GO Molecular Function',
                     'Pathway',       'Protein Domains',
                    'Probe ID',                     nan,
                           nan,                     nan,
                           nan,                     nan,
                           nan,                     nan,
                           nan,                     nan,
                           nan,                     nan,
                           nan,                     nan,
                           nan,                     nan,
                           nan,                     nan,
                           nan,                     nan,
                           nan,                     nan,
                           nan,                     nan,
                           nan,                     nan,
                           nan,                     nan,
                           nan,                     nan],
      dtype='object', name=0)

Expected output:

Index([               'strand',                 'start',
                        'stop',          'total_probes',
             'gene_assignment',       'mrna_assignment',
                   'swissprot',               'unigene',
       'GO_biological_process', 'GO_cellular_component',
       'GO_molecular_function',               'pathway',
             'protein_domains',         'crosshyb_type',
                    'category',               'seqname',
                  'Gene Title',              'Cytoband',
                 'Entrez Gene',            'Swiss-Prot',
                     'UniGene', 'GO Biological Process',
       'GO Cellular Component', 'GO Molecular Function',
                     'Pathway',       'Protein Domains',
                    'Probe ID',                     1,
                             2,                     3,
                             4,                     5,
                             6,                     7,
                             8,                     9,
                             10,                     11,
                             12,                     13,
                             14,                     15,
                             16,                     17,
                             18,                     19,
                             20,                     21,
                             22,                     23,
                             24,                     25,
                             26,                     27,
                             28,                     29],
      dtype='object', name=0)

CodePudding user response:

This will do it.


temp_columns_name= []
nan_count= 1

for i in df.columns:
    if pd.isnull(i):
        temp_columns_name.append(nan_count)
        nan_count = 1
    else:
        temp_columns_name.append(i)

df.columns= temp_columns_name

print(df.columns)

Output:

['strand',
 'start',
 'stop',
 'total_probes',
 'gene_assignment',
 'mrna_assignment',
 'swissprot',
 'unigene',
 'GO_biological_process',
 'GO_cellular_component',
 'GO_molecular_function',
 'pathway',
 'protein_domains',
 'crosshyb_type',
 'category',
 'seqname',
 'Gene Title',
 'Cytoband',
 'Entrez Gene',
 'Swiss-Prot',
 'UniGene',
 'GO Biological Process',
 'GO Cellular Component',
 'GO Molecular Function',
 'Pathway',
 'Protein Domains',
 'Probe ID',
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29]

  • Related