Home > database >  Loading multiple .txt files to pandas dataframe with columns
Loading multiple .txt files to pandas dataframe with columns

Time:09-02

I am trying to read all the .txt files with the below format provided and concat them to a single pandas dataframe.

sample1.txt

ID                                    a123
Delivery_person_ID             VADRES03DEL01
Delivery_person_Age                    24.00
Delivery_person_Ratings                 4.30
Name: 1, dtype: object

sample2.txt

ID                                    b123
Delivery_person_ID             VADRES03DEL02
Delivery_person_Age                    22.00
Delivery_person_Ratings                 4.10
Name: 2, dtype: object

Below is the code -

folder_path = '/drive/My Drive/dataset/train'
file_list = glob.glob(folder_path   "/*.txt")
main_dataframe = pd.read_fwf(file_list[0], header=None)
  
for i in range(1,len(file_list)):    
    df = pd.read_fwf(file_list[i], header=None)
    main_dataframe = pd.concat([main_dataframe, df], axis = 0)
  
print(main_dataframe.head(30))  

Output:

                              0               1
0                            ID          a123
1            Delivery_person_ID  VADRES03DEL01
2           Delivery_person_Age       24.00
3       Delivery_person_Ratings        4.30
4       Name: 1, dtype: object             NaN
0                            ID          b123
1            Delivery_person_ID  VADRES03DEL02
2           Delivery_person_Age       22.00
3       Delivery_person_Ratings        4.10
4       Name: 2, dtype: object            NaN

But I need the dataframe to be listed row wise for each person. For eg, in below format I want -

                              ID          Delivery_person_ID  Delivery_person_Age       Delivery_person_Ratings       
                              0  a123                VADRES03DEL01      24.00              4.30                             

                              1  b123                VADRES03DEL02      22.00              4.10      

Edit:

       0                   1                    2                        3   \
0      ID  Delivery_person_ID  Delivery_person_Age  Delivery_person_Ratings   
1  0x3b9d      BANGRES19DEL02            23.000000                 4.800000   

                    4                     5                           6   \
0  Restaurant_latitude  Restaurant_longitude  Delivery_location_latitude   
1            12.914264             77.678400                   12.934264   

                            7           8            9                  10  \
0  Delivery_location_longitude  Order_Date  Time_Orderd  Time_Order_picked   
1                    77.698400  01-04-2022         9:60              10:05   

                   11                    12                 13             14  \
0  Weather conditions  Road_traffic_density  Vehicle_condition  Type_of_order   
1               Windy                   Low                  2         Buffet   

                15                   16        17     18  \
0  Type_of_vehicle  multiple_deliveries  Festival   City   
1          scooter             0.000000        No  Urban   

                       19  
0  Name: 6, dtype: object  
1                     NaN  

CodePudding user response:

After reading text file to pandas dataframe make it transform for each one

folder_path = '/drive/My Drive/dataset/train'
file_list = glob.glob(folder_path   "/*.txt")
main_dataframe = pd.read_fwf(file_list[0], header=None).T
  
for i in range(1,len(file_list)):    
    df = pd.read_fwf(file_list[i], header=None).T
    main_dataframe = pd.concat([main_dataframe, df], axis = 0)
  
print(main_dataframe.head(30))  

CodePudding user response:

So, the input text file is weird - this code should deal with that

# Read in text file
df = pd.read_fwf("./test.txt")
# Remove the "Name: 1, dtype: object"
df = df.drop(df.index[3])
# Transpose it
df = df.T
# Rename the columns correctly
df.columns = df.iloc[0]
# Remove the column names from the data
df = df.drop(df.index[0])

An input text file that looks like this:

ID                                    a123
Delivery_person_ID             VADRES03DEL01
Delivery_person_Age                    24.00
Delivery_person_Ratings                 4.30
Name: 1, dtype: object

Would be converted to this:

ID   Delivery_person_ID Delivery_person_Age Delivery_person_Ratings
a123      VADRES03DEL01               24.00                    4.30

From here, you can do the same for each text file, then do a pd.concat() to merge the new textfile dataframe to the main dataframe, but from your code I can see that you already know how to do this.

CodePudding user response:

just used transpose:

main_dataframe = main_dataframe.T 
  • Related