I am trying to read all the .txt files with the below format provided and concat them to a single pandas dataframe.
sample1.txt
ID a123
Delivery_person_ID VADRES03DEL01
Delivery_person_Age 24.00
Delivery_person_Ratings 4.30
Name: 1, dtype: object
sample2.txt
ID b123
Delivery_person_ID VADRES03DEL02
Delivery_person_Age 22.00
Delivery_person_Ratings 4.10
Name: 2, dtype: object
Below is the code -
folder_path = '/drive/My Drive/dataset/train'
file_list = glob.glob(folder_path "/*.txt")
main_dataframe = pd.read_fwf(file_list[0], header=None)
for i in range(1,len(file_list)):
df = pd.read_fwf(file_list[i], header=None)
main_dataframe = pd.concat([main_dataframe, df], axis = 0)
print(main_dataframe.head(30))
Output:
0 1
0 ID a123
1 Delivery_person_ID VADRES03DEL01
2 Delivery_person_Age 24.00
3 Delivery_person_Ratings 4.30
4 Name: 1, dtype: object NaN
0 ID b123
1 Delivery_person_ID VADRES03DEL02
2 Delivery_person_Age 22.00
3 Delivery_person_Ratings 4.10
4 Name: 2, dtype: object NaN
But I need the dataframe to be listed row wise for each person. For eg, in below format I want -
ID Delivery_person_ID Delivery_person_Age Delivery_person_Ratings
0 a123 VADRES03DEL01 24.00 4.30
1 b123 VADRES03DEL02 22.00 4.10
Edit:
0 1 2 3 \
0 ID Delivery_person_ID Delivery_person_Age Delivery_person_Ratings
1 0x3b9d BANGRES19DEL02 23.000000 4.800000
4 5 6 \
0 Restaurant_latitude Restaurant_longitude Delivery_location_latitude
1 12.914264 77.678400 12.934264
7 8 9 10 \
0 Delivery_location_longitude Order_Date Time_Orderd Time_Order_picked
1 77.698400 01-04-2022 9:60 10:05
11 12 13 14 \
0 Weather conditions Road_traffic_density Vehicle_condition Type_of_order
1 Windy Low 2 Buffet
15 16 17 18 \
0 Type_of_vehicle multiple_deliveries Festival City
1 scooter 0.000000 No Urban
19
0 Name: 6, dtype: object
1 NaN
CodePudding user response:
After reading text file to pandas dataframe
make it transform
for each one
folder_path = '/drive/My Drive/dataset/train'
file_list = glob.glob(folder_path "/*.txt")
main_dataframe = pd.read_fwf(file_list[0], header=None).T
for i in range(1,len(file_list)):
df = pd.read_fwf(file_list[i], header=None).T
main_dataframe = pd.concat([main_dataframe, df], axis = 0)
print(main_dataframe.head(30))
CodePudding user response:
So, the input text file is weird - this code should deal with that
# Read in text file
df = pd.read_fwf("./test.txt")
# Remove the "Name: 1, dtype: object"
df = df.drop(df.index[3])
# Transpose it
df = df.T
# Rename the columns correctly
df.columns = df.iloc[0]
# Remove the column names from the data
df = df.drop(df.index[0])
An input text file that looks like this:
ID a123
Delivery_person_ID VADRES03DEL01
Delivery_person_Age 24.00
Delivery_person_Ratings 4.30
Name: 1, dtype: object
Would be converted to this:
ID Delivery_person_ID Delivery_person_Age Delivery_person_Ratings
a123 VADRES03DEL01 24.00 4.30
From here, you can do the same for each text file, then do a pd.concat() to merge the new textfile dataframe to the main dataframe, but from your code I can see that you already know how to do this.
CodePudding user response:
just used transpose:
main_dataframe = main_dataframe.T