how to rename unnamed column names with the immediate next column name in pandas python-CodePudding

im running a code to extract pdf tables to separate csv.i have done that using tabula py..now im getting each table but with some errors like unnamed:0,unnamed:1,..so on as few column names.i need to clean the csv tables now..

the output which i get:

Unnamed: 0,id,Unnamed: 1,Unnamed: 2,Length,Unnamed: 3,Typ,Name,Val ,Com 1 ,5 ,3 ,Company ,Assign,fan , , , ,

the desired output which i want:

id ,Length ,Types ,Name ,Value ,Comments 1 ,5 ,3 ,Company ,Assigned ,fan

like this i get tables tables help me get the correct one. how shud i code my df to look this way...if the unnamed is present in column name ..the unnamed should be replaced with the immediate next desired name other than unnamed,if the next column is also unnamed it shud skip and look the next column and so on...

i got a solution to delete the column which has unnamed column name with null values:but i need solution to the above

  ser_all_na= df.isna().all(axis='rows')
  del_indexer= ser_all_na.index.str.startswith('Unnamed: ')
  del_indexer &= ser_all_na
  df.drop(columns=ser_all_na[del_indexer].index, inplace=True)
  df.to_csv("output.csv", index=False)

CodePudding user response：

Get the column names first by selecting the named columns, drop the unnamed columns and rename the df.

import pandas as pd
import numpy as np

df = pd.DataFrame({'Unnamed: 0': [1],
 'id': [5],
 'Unnamed: 1': [3],
 'Unnamed: 2': ['Company'],
 'Length': ['Assigned'],
 'Unnamed: 3': ['fan'],
 'Types': [np.nan],
 'Name': [np.nan],
 'Value': [np.nan],
 'Comments': [np.nan]})

cols = [x for x in df if "Unnamed" not in x]

df = df.dropna(axis=1)

df.columns = cols

print(df)

Output

   id  Length  Types     Name     Value Comments
0   1       5      3  Company  Assigned      fan

CodePudding user response：

# select all columns that are not nameed as 'unnamed'
cols=df.columns[~df.columns.str.contains('Unnamed')]

# create a new data filtering count of columns that are not unnamed
df2=df.iloc[:,:len(cols)]

#rename the column names
df2.columns = cols
df2

    id  Length  Types      Name        Value    Comments
0    1       5      3   Company     Assigned         fan