Home > Mobile >  How to replace such column names whose next columns are having all nan values in Pandas
How to replace such column names whose next columns are having all nan values in Pandas

Time:12-11

I have a file which has data separated with different spaces and column names are also having spaces.

Type Dec LookupTable               Field Name                Field Len Start Pos
NUM  0   _                         sample data               5         1
NUM  0   _                         sample data               10        6
CHAR 0   _                         sample data               60        16
NUM  0   _                         sample data               3         76
CHAR 0   _                         sample data               60        79
CHAR 0   _                         sample data               60        139
CHAR 0   _                         sample data               60        199
CHAR 0   _                         sample data               60        259
NUM  0   _                         sample data               3         319
CHAR 0   _                         sample data               60        322
CHAR 0   _                         sample data               60        382
NUM  0   _                         sample data               3         442
CHAR 0   _                         sample data               60        445

I am reading this file like this

df= pd.read_fwf('./temp.txt', colspecs= 'infer')

and getting the dataframe with columns which are separated by spaces nan values

I want to drop the Nan columns and replace its previous columns name with the empty one. How can we achieve this in an efficient way? enter image description here

Expected outpuL: Expected Output

CodePudding user response:

I read the data from clipboard, so my DataFrame looks a little bit different than yours, that also means, you have to adjust the code, which shouldn't be a big deal

df.drop(columns=['Start', 'Pos']).rename(columns={'Name': 'Superman', 'Type': 

'Mytype'})  # Very clear code, I think it doesn't need any explanation

df.dropna(axis=1,how='all').rename(columns={'Name': 'Superman', 'Type': 'Mytype'}) # the second way df.dropna(axis=1,how='all') means that it drops all columns with nans, but only if all of the values are nans.

    Type  Dec LookupTable   Field  Name  Field.1  Len  Start  Pos
0    NUM    0           _  sample  data        5    1    NaN  NaN
1    NUM    0           _  sample  data       10    6    NaN  NaN
2   CHAR    0           _  sample  data       60   16    NaN  NaN
3    NUM    0           _  sample  data        3   76    NaN  NaN
4   CHAR    0           _  sample  data       60   79    NaN  NaN
5   CHAR    0           _  sample  data       60  139    NaN  NaN
6   CHAR    0           _  sample  data       60  199    NaN  NaN
7   CHAR    0           _  sample  data       60  259    NaN  NaN
8    NUM    0           _  sample  data        3  319    NaN  NaN
9   CHAR    0           _  sample  data       60  322    NaN  NaN
10  CHAR    0           _  sample  data       60  382    NaN  NaN
11   NUM    0           _  sample  data        3  442    NaN  NaN
12  CHAR    0           _  sample  data       60  445    NaN  NaN

df.drop(columns=['Start', 'Pos']).rename(columns={'Name': 'Superman', 'Type': 'Mytype'})
#df.dropna(axis=1,how='all').rename(columns={'Name': 'Superman', 'Type': 'Mytype'})
Out[30]: 
   Mytype  Dec LookupTable   Field Superman  Field.1  Len
0     NUM    0           _  sample     data        5    1
1     NUM    0           _  sample     data       10    6
2    CHAR    0           _  sample     data       60   16
3     NUM    0           _  sample     data        3   76
4    CHAR    0           _  sample     data       60   79
5    CHAR    0           _  sample     data       60  139
6    CHAR    0           _  sample     data       60  199
7    CHAR    0           _  sample     data       60  259
8     NUM    0           _  sample     data        3  319
9    CHAR    0           _  sample     data       60  322
10   CHAR    0           _  sample     data       60  382
11    NUM    0           _  sample     data        3  442
12   CHAR    0           _  sample     data       60  445

CodePudding user response:

import pandas as pd
import numpy as np

aaa = [(df.columns[i - 1], df.columns[i]) for i in range(1, len(df.columns)) if df[df.columns[i]].isna().all()]
bbb = dict(aaa)
df = df.drop(columns=np.array(aaa)[:, 1])
df.rename(columns=bbb, inplace=True)

print(df)

Here, in list comprehension aaa, paired tuples are created (on the left is the name of the column to be renamed, on the right is the name of the column to be deleted). Columns that are all with empty values are checked by the condition:

if df[df.columns[i]].isna().all()

A dictionary is created from aaa. drop removes the selected columns np.array(aaa)[:, 1] (to select an array by slice I wrap it with np.array).

  • Related