I have multiple CSV files in which there are nearly 1k rows in each of the files and just have 2 columns, the 1st column of every file is similar for every CSV file but the 2nd column is specific for every other value like for motorcycle Engine Capacity, Cylinders, Power, Fuel, Cost, etc (these values are individually available in different CSV files). I tried extracting the data from all the CSV files and appending them in a list but I'm not getting a proper answer. Below is the method I tried
import numpy as np
import pandas as pd
import glob
csvs = glob.glob('D:/Project/*.csv')
print(csvs)
a = []
for proj in csvs:
df1 = pd.read_csv(proj, index_col=None, header = 0)
a.append(df1)
print(a)#Check Output 1
df2 = pd.concat(a, axis=0, ignore_index=True)
df2.head()#check output 2
#Output 1
[Unnamed: 0 engine capacity
0 0 1497.169492
1 1 1166.142857
2 2 1537.406593
3 3 1219.653846
4 4 1995.777778
[1035 rows x 2 columns],
Unnamed: 0 fuel
0 0 petrol
1 1 petrol
2 2 electrical
3 3 electrical
4 4 electrical
[1035 rows x 2 columns],
Unnamed: 0 power
0 0 57.342
1 1 64.232
2 2 56.405
3 3 98.109
4 4 101.455
[1035 rows x 2 columns]]
#Output 2
Unnamed: 0 enginecapacity fuel power
0 0 1500.134 NaN NaN
1 1 1299.132 NaN NaN
2 2 1200.455 NaN NaN
3 3 1700.456 NaN NaN
4 4 1430.232 NaN NaN
It is not giving the correct output for other columns it's just giving the correct output for the 1st file whichever it takes and for the remaining it's giving NaN. What is my mistake I'm doing in here
CodePudding user response:
Try:
data = []
for proj in csvs:
df1 = pd.read_csv(proj, index_col=0, header=0)
data.append(df1)
df2 = pd.concat(data, axis=1)
Output:
>>> df2
engine capacity fuel power
0 1497.169492 petrol 57.342
1 1166.142857 petrol 64.232
2 1537.406593 electrical 56.405
3 1219.653846 electrical 98.109
4 1995.777778 electrical 101.455