Home > Back-end >  Reading .txt file and exporting to .xlsx using pandas(?)
Reading .txt file and exporting to .xlsx using pandas(?)

Time:07-07

I have a input file as .txt which contains serial numbers and production date alternating between each other. Example: SERIALNUMBER1

PRODUCTIONDATE1

SERIALNUMBER2

PRODUCTIONDATE2

Now I want to create to Headers with pandas using this block of code:

def read_txt_export_to_excel():
    dataframe_test_raw = pd.read_table("serialnumbers.txt", delimiter=" ")
    dataframe_test_raw.columns = ["ProductionDate", "Time", "Serialnumber"]
    dataframe_test_raw.to_excel("output.xlsx")

So far, so good. I just dont understand how I can make pandas to read all serialnumbers which have a prefix so its easy to find and place them for example under "Serialnumber" in the .xlsx.

Every serialnumber begins with "X87"

My guess is to start like that:

with open('serialnumbers.txt', 'r') as f:
    text = f.read()
        for line in f:

And now I have kind of a blockade. I want to scan the serialnumbers.txt with the "for" loop and pass it to pandas.

I hope you can help me!

Thanks guys!

EDIT SERIALNUMBERS.txt http://pastie.org/p/4WADkzaLVL5Tsijgh5SyGc

What I need: I want the Serialnumbers to appear in the excel file with the column name "Serialnumbers" I also want the Production date (which is always under the serialnumber) next to it in the excel with the column name "Production Date".

What I have to far thanks to @It_is_chris:

def read_txt_export_to_excel():
    # Production data read-in
    df = pd.read_table("Serialnumbers.txt", delimiter=" ")
    print(df.columns)

    # filter every other row using .iloc and concat
    data = pd.concat([df[ProductionDate].iloc[1::2].reset_index(drop=True),
                  df[SerialNumber].iloc[::2].reset_index(drop=True)], axis=1)

    # assign column names
    data.columns = ['ProductionDate', 'SerialNumber']
    df.to_excel("output.xlsx")

CodePudding user response:

If the date and serial number just alternate every row then do

# sample data
df = pd.DataFrame(['SERIALNUMBER1', 'PRODUCTIONDATE1',
                   'SERIALNUMBER2', 'PRODUCTIONDATE2'])

# filter every other row using .iloc and concat
data = pd.concat([df[0].iloc[1::2].reset_index(drop=True),
                  df[0].iloc[::2].reset_index(drop=True)], axis=1)

# assign column names
data.columns = ['ProductionDate', 'SerialNumber']


    ProductionDate   SerialNumber
0  PRODUCTIONDATE1  SERIALNUMBER1
1  PRODUCTIONDATE2  SERIALNUMBER2

Here is an updated solution

df = pd.read_csv('serialnumbers.txt', header=None, sep='\s\s ', engine='python')

# filter every other row using .iloc and concat
data = pd.concat([df[0].iloc[1::2].reset_index(drop=True),
                  df[0].iloc[::2].reset_index(drop=True)], axis=1)

# rename columns
data.columns = ['ProductionDate', 'Serialnumber']

# convert date column to datetime
data['ProductionDate'] = pd.to_datetime(data['ProductionDate'])
# create a new time column
data['Time'] = data['ProductionDate'].dt.time
# remove the time from the date . . . if you want
data['ProductionDate'] = data['ProductionDate'].dt.date
print(data)

  ProductionDate        Serialnumber      Time
0     2022-07-06  X87YAAAAAAAAAAAAAA  06:43:00
1     2022-07-06  X87YBBBBBBBBBBBBBB  06:51:00
2     2022-07-06  X87YCCCCCCCCCCCCCC  06:53:00
3     2022-07-06  X87YDDDDDDDDDDDDDD  06:54:00
  • Related