Python Panda combine header row-CodePudding

I have this dataframe:

     Artikel                 description  nunber  % av     Price    Price2 
0  nummer                         NaN      NaN        NaN   100 st   100 st   
1      20       aaaaaaaaaaaaaaaaaaaaa    28.0      0,03 %    21,25    12,     
2      21       bbbbbbbbbbbbbbbbbbbbb    928.0     0,86 %    83,57    0,5    
3      22       ccccccccccccccccccccc    44569.0   41,27 %   92,12    0,5

I want to combine the first two rows (and delete the Nan's) to end up like this:

     Artikel nummer description      nunber        % av     Price 100 st    Price2 100 st 
                                               
0      20       aaaaaaaaaaaaaaaaaaaaa    28.0      0,03 %    21,25          12,     
1      21       bbbbbbbbbbbbbbbbbbbbb    928.0     0,86 %    83,57          0,5    
2      22       ccccccccccccccccccccc    44569.0   41,27 %   92,12          0,5

I tried this solution: Pandas: combining header rows of a multiIndex DataFrame but I can't figure out how that would work with my data. I am a bit of a noob with Python.

CodePudding user response：

If there is MultiIndex:

#parameter header for convert first 2 rows to MultiIndex
df = pd.read_csv(file, header=[0,1])


print (df.columns)
MultiIndex([(    'Artikel',      '0'),
            ('description', 'nummer'),
            (     'nunber',      nan),
            (       '% av',      nan),
            (      'Price',      nan),
            (     'Price2', '100 st')],
           )

df.columns = [f'{a} {b}' if pd.notna(b) else a for a, b in df.columns]
print (df)
   Artikel 0     description nummer   nunber     % av  Price Price2 100 st
1         20  aaaaaaaaaaaaaaaaaaaaa     28.0   0,03 %  21,25           12,
2         21  bbbbbbbbbbbbbbbbbbbbb    928.0   0,86 %  83,57           0,5
3         22  ccccccccccccccccccccc  44569.0  41,27 %  92,12           0,5

If NaNs are strings:

print (df.columns)
MultiIndex([(    'Artikel',      '0'),
            ('description', 'nummer'),
            (     'nunber',    'NaN'),
            (       '% av',    'NaN'),
            (      'Price',    'NaN'),
            (     'Price2', '100 st')],
           )

df.columns = [f'{a} {b}' if b != 'NaN' else a for a, b in df.columns]
print (df)
   Artikel 0     description nummer   nunber     % av  Price Price2 100 st
1         20  aaaaaaaaaaaaaaaaaaaaa     28.0   0,03 %  21,25           12,
2         21  bbbbbbbbbbbbbbbbbbbbb    928.0   0,86 %  83,57           0,5
3         22  ccccccccccccccccccccc  44569.0  41,27 %  92,12           0,5

If not MultiIndex and first row is necessary join to columns names:

df.columns = [f'{a} {b}' if pd.notna(b) else a for a, b in zip(df.columns, df.iloc[0])]
df = df.iloc[1:].reset_index(drop=True)
print (df)
  Artikel nummer            description   nunber     % av Price 100 st  \
0             20  aaaaaaaaaaaaaaaaaaaaa     28.0   0,03 %        21,25   
1             21  bbbbbbbbbbbbbbbbbbbbb    928.0   0,86 %        83,57   
2             22  ccccccccccccccccccccc  44569.0  41,27 %        92,12   

  Price2 100 st  
0           12,  
1           0,5  
2           0,5