Home > OS >  extract data from multiple csv files and prepare one dataframe using pandas
extract data from multiple csv files and prepare one dataframe using pandas

Time:03-21

I have multiple CSV files in which there are nearly 1k rows in each of the files and just have 2 columns, the 1st column of every file is similar for every CSV file but the 2nd column is specific for every other value like for motorcycle Engine Capacity, Cylinders, Power, Fuel, Cost, etc (these values are individually available in different CSV files). I tried extracting the data from all the CSV files and appending them in a list but I'm not getting a proper answer. Below is the method I tried

import numpy as np
import pandas as pd
import glob
csvs = glob.glob('D:/Project/*.csv')
print(csvs)
a = []
for proj in csvs:
    df1 = pd.read_csv(proj, index_col=None, header = 0)
    a.append(df1)
print(a)#Check Output 1
df2 = pd.concat(a, axis=0, ignore_index=True)
df2.head()#check output 2

#Output 1

[Unnamed: 0    engine capacity
0         0    1497.169492
1         1    1166.142857
2         2    1537.406593
3         3    1219.653846
4         4    1995.777778
[1035 rows x 2 columns],
Unnamed: 0    fuel
0        0   petrol
1        1   petrol
2        2  electrical
3        3  electrical
4        4  electrical
[1035 rows x 2 columns],
Unnamed: 0    power
0        0   57.342
1        1   64.232
2        2   56.405
3        3   98.109
4        4   101.455
[1035 rows x 2 columns]]

#Output 2

Unnamed: 0  enginecapacity    fuel      power
  0      0     1500.134       NaN        NaN
  1      1     1299.132       NaN        NaN
  2      2     1200.455       NaN        NaN
  3      3     1700.456       NaN        NaN
  4      4     1430.232       NaN        NaN

It is not giving the correct output for other columns it's just giving the correct output for the 1st file whichever it takes and for the remaining it's giving NaN. What is my mistake I'm doing in here

CodePudding user response:

Try:

data = []
for proj in csvs:
    df1 = pd.read_csv(proj, index_col=0, header=0)
    data.append(df1)
df2 = pd.concat(data, axis=1)

Output:

>>> df2
   engine capacity        fuel    power
0      1497.169492      petrol   57.342
1      1166.142857      petrol   64.232
2      1537.406593  electrical   56.405
3      1219.653846  electrical   98.109
4      1995.777778  electrical  101.455
  • Related