Home > Software design >  Pandas Getting first number of rows of each xls file
Pandas Getting first number of rows of each xls file

Time:07-07

I have about (100 files ) XLS files in a folder with different columns names and data types.

File_1.xls:

Id  test  category
1    ab      4
2    cs      3
3    cs      1
..   ..

FILE_2.xls:

index remove  stocks  category
1       dr      4         a
2       as      3         b
3       ae      1         v
..

File 3: ....

File 100.....

This is my code:

import pandas as pd
import pathlib

    files = []
    for filename in pathlib.Path.cwd().iterdir():
        if filename.suffix.lower().startswith('.xls'):
            files.append(filename)
            
    df = (pd.concat({f: pd.read_excel(f).melt() for f in files})
            .rename_axis(['filename', None]).reset_index(0)
            .reset_index(drop=True)
            .iloc[:1]
         )

Output

   filename  variable value
0    File_1        Id     1
1    File_1        Id     2

Expected Results

   filename  variable value
0    File_1        Id     1
1    File_1        Id     2
2    File_1      test    ab
3    File_1      test    cs
4    File_1  category     4
5    File_1  category     3
6    File_2     index     1
7    File_2     index     2
8    File_2    remove    dr
9    File_2    remove    as
10   File_2    stocks     4
11   File_2    stocks     3
12   File_2  category     a
13   File_2  category     b
...

I want to get for all the files the first 2 rows, Melt, and export them in a dataframe.

CodePudding user response:

try this

import pathlib
import pandas as pd

files = []
for filename in pathlib.Path.cwd().iterdir():
    if filename.suffix.lower().startswith('.xls'):
        files.append(filename)

# DataFrame for final result
empty_df = pd.DataFrame()

# iterating over each file        
for f in files:
    # read first 2 rows of DataFrame and melt
    temp = pd.melt(pd.read_excel(f).head(2))
    
    # add a new column that assigns the filename
    temp['filename'] = f
    
    # concat with the final result DataFrame
    empty_df = pd.concat([empty_df, temp])

# if column order is relevant you may re-arrange them like so
empty_df = empty_df[['filename', 'variable', 'value']]

display(empty_df)
  • Related