I have a input file as .txt which contains serial numbers and production date alternating between each other. Example: SERIALNUMBER1
PRODUCTIONDATE1
SERIALNUMBER2
PRODUCTIONDATE2
Now I want to create to Headers with pandas using this block of code:
def read_txt_export_to_excel():
dataframe_test_raw = pd.read_table("serialnumbers.txt", delimiter=" ")
dataframe_test_raw.columns = ["ProductionDate", "Time", "Serialnumber"]
dataframe_test_raw.to_excel("output.xlsx")
So far, so good. I just dont understand how I can make pandas to read all serialnumbers which have a prefix so its easy to find and place them for example under "Serialnumber" in the .xlsx.
Every serialnumber begins with "X87"
My guess is to start like that:
with open('serialnumbers.txt', 'r') as f:
text = f.read()
for line in f:
And now I have kind of a blockade. I want to scan the serialnumbers.txt with the "for" loop and pass it to pandas.
I hope you can help me!
Thanks guys!
EDIT SERIALNUMBERS.txt http://pastie.org/p/4WADkzaLVL5Tsijgh5SyGc
What I need: I want the Serialnumbers to appear in the excel file with the column name "Serialnumbers" I also want the Production date (which is always under the serialnumber) next to it in the excel with the column name "Production Date".
What I have to far thanks to @It_is_chris:
def read_txt_export_to_excel():
# Production data read-in
df = pd.read_table("Serialnumbers.txt", delimiter=" ")
print(df.columns)
# filter every other row using .iloc and concat
data = pd.concat([df[ProductionDate].iloc[1::2].reset_index(drop=True),
df[SerialNumber].iloc[::2].reset_index(drop=True)], axis=1)
# assign column names
data.columns = ['ProductionDate', 'SerialNumber']
df.to_excel("output.xlsx")
CodePudding user response:
If the date and serial number just alternate every row then do
# sample data
df = pd.DataFrame(['SERIALNUMBER1', 'PRODUCTIONDATE1',
'SERIALNUMBER2', 'PRODUCTIONDATE2'])
# filter every other row using .iloc and concat
data = pd.concat([df[0].iloc[1::2].reset_index(drop=True),
df[0].iloc[::2].reset_index(drop=True)], axis=1)
# assign column names
data.columns = ['ProductionDate', 'SerialNumber']
ProductionDate SerialNumber
0 PRODUCTIONDATE1 SERIALNUMBER1
1 PRODUCTIONDATE2 SERIALNUMBER2
Here is an updated solution
df = pd.read_csv('serialnumbers.txt', header=None, sep='\s\s ', engine='python')
# filter every other row using .iloc and concat
data = pd.concat([df[0].iloc[1::2].reset_index(drop=True),
df[0].iloc[::2].reset_index(drop=True)], axis=1)
# rename columns
data.columns = ['ProductionDate', 'Serialnumber']
# convert date column to datetime
data['ProductionDate'] = pd.to_datetime(data['ProductionDate'])
# create a new time column
data['Time'] = data['ProductionDate'].dt.time
# remove the time from the date . . . if you want
data['ProductionDate'] = data['ProductionDate'].dt.date
print(data)
ProductionDate Serialnumber Time
0 2022-07-06 X87YAAAAAAAAAAAAAA 06:43:00
1 2022-07-06 X87YBBBBBBBBBBBBBB 06:51:00
2 2022-07-06 X87YCCCCCCCCCCCCCC 06:53:00
3 2022-07-06 X87YDDDDDDDDDDDDDD 06:54:00