Imagine I have an excel file with the name LP_Elements_Shocked_202108160517.xlsx
I would like to pull out this specific part of the file name and store it as an integer 20210816
The pattern is consistent. All files begin with LP_Elements_Shocked_
and then are followed by the eight digits I need. And then there will always be 4 more digits I do not need after those
Here is what I have so far:
import pandas as pd
pd.read_excel('LP_Elements_Shocked_202108160517.xlsx')
CodePudding user response:
Since your pattern always starts with the same string, you can just use a substring (slice the string):
filename = 'LP_Elements_Shocked_202108160517.xlsx'
print(filename[20:28]) # prints: '20210816'
otherwise you could use a regex for more complex patterns.
For the part (from the comments) where you want to keep the filename with each dataframe, the simplest would be to add a column filled with the filename to each dataframe you read (by itself, pandas does not keep track of the filename of the excel file).
See this related Q&A: read_excel into data frame and keep file name as column (Pandas)
CodePudding user response:
use re
import re
file_name = 'LP_Elements_Shocked_202108160517.xlsx'
num = re.findall("\d ", file_name)[0][:-4]
print(num)
output
20210816