I have a large text file I have imported in python and want to split into lines by a key word, then use those lines to take out relevent information into a dataframe.
The data follows along the same pattern for each line but wont be the exact same number of characters and some lines may have extra data
So I have a text file such as:
{data: name:Mary, friends:2, cookies:10, chairs:4},{data: name:Gerald friends:2, cookies:10, chairs:4, outside:4},{data: name:Tom, friends:2, cookies:10, chairs:4, stools:1}
There is always the key word data between lines, is there any way I can split it out by using this word as the beginning of the line (then put it into a dataframe)?
I'm not sure where to begin so any help would be amazing
CodePudding user response:
When you get the content of a .txt
file like this...
with open("file.txt", 'r') as file:
content = file.read()
...you have it as a str
ing, so you can split it with the function str.split()
:
content = content.split(my_keyword)
You can do it with a function:
def splitter(path: str, keyword: str) -> str:
with open(path, 'r') as file:
content = file.read()
return content.split(keyword)
that you can call this way:
>>> splitter("file.txt", "data")
["I really like to write the word ", ", because I think it has a lot of meaning."]