I have data in a txt file and I need to separate a sentence from a value. Every line of the txt file has the form <Sentence> <number>
. I need to read the value and the sentence in two different columns, but the sentences can contain numbers, dots and every possible stuff since they are just random sentences. The numeric value in question though is always at the end of the line.
For example :
This coffee is bad. -1
How can I do this in Python?
CodePudding user response:
if it always follows the format sentence / random <space><number><end>
then something like:
sent, _, num = input_str.rpartition(' ')
CodePudding user response:
Here is a solution using pandas to load the CSV as DataFrame with a regex separator:
import pandas as pd
df = pd.read_csv('file.csv', sep='\s(?=\S $)', engine='python',
header=None, names=['sentence', 'Value'])
Output:
sentence value
0 This coffee is bad. -1
1 other example 123
You can then easily convert to lists:
df.to_dict('list')
Output:
{'sentence': ['This coffee is bad.', 'other example'],
'value': [-1, 123]}
Used text input:
This coffee is bad. -1
other example 123
CodePudding user response:
There are many ways to do it.
The simple/dirty solution is as follows:
- Run regex pattern to extract digit groups then select the last one as the second column.
- Subtract what you find in the first step from the string/line and make it the first column.
This code should give you an idea.
import re
sample = "This coffee 5656 is bad. -134 -454"
result = re.findall('[0-9] ', sample)
first_column = sample.replace(result[-1], '')
second_column = result[-1]
print(f'First Column: {first_column}')
print(f'Second Column: {second_column}')
Output
First Column: This coffee 5656 is bad. -134 -
Second Column: 454