I'm trying to split a column of my dataframe (df_example) based on a different column. I only want to get the sentences which contain the value of a different column. What is the best/cleanest way to do this in Python? In the example below I attached what I would like to see in the desired_outcome column.
Looking forward to the help.
CodePudding user response:
Data:
>>> df = pd.read_csv("data.csv")
>>> df
FIlter_value text
0 flower This is a flower. It has amazing colors.
1 tree This is no flower. It is a tree. The tree is b...
2 car Flying with a car is crazy. You should drive i...
>>> df['desired_column'] = (df.apply(
... lambda x: '.'.join(
... [i for i in x.text.split(".") if x.FIlter_value in i]
... ) ,
... axis=1
... ).to_list())
>>> df
FIlter_value text desired_column
0 flower This is a flower. It has amazing colors. This is a flower
1 tree This is no flower. It is a tree. The tree is b... It is a tree. The tree is broun
2 car Flying with a car is crazy. You should drive i... Flying with a car is crazy
CodePudding user response:
This is not excatly that you want, because I d'ont use pandas, but it works fine. I hope you'll know to adapt to your benefit.
content of resources/get_text.txt:
filter_value,text,desired_outcome
flower,This is a flower. It has amazing colors.,This is a flower.
tree,This is not a flower. It is a tree. The tree is brown.,It is a tree. The tree is brown.
car,Flying with a car is crazy. You should drive it. It smells like a flower
script:
import csv
if __name__ == '__main__':
with open("resources\get_text.txt", 'r') as f:
dict_reader = csv.DictReader(f)
headers = dict_reader.fieldnames
for item in dict_reader:
target = item['filter_value']
sentences = item['text'].split('. ')
available_sentences = [_ for _ in sentences if target in _]
output = '. '.join(available_sentences)
if not output.endswith('.'):
output = f"{output}."
print(output)
assert output == item['desired_outcome']