(python) Only get the sentences in a column, based on a value of a different column in the same row-CodePudding

I'm trying to split a column of my dataframe (df_example) based on a different column. I only want to get the sentences which contain the value of a different column. What is the best/cleanest way to do this in Python? In the example below I attached what I would like to see in the desired_outcome column.

Looking forward to the help.

enter image description here

CodePudding user response：

Data:

>>> df = pd.read_csv("data.csv")
>>> df

  FIlter_value                                               text
0       flower           This is a flower. It has amazing colors.
1         tree  This is no flower. It is a tree. The tree is b...
2          car  Flying with a car is crazy. You should drive i...


>>> df['desired_column'] = (df.apply(
...     lambda x: '.'.join(
...         [i for i in x.text.split(".") if x.FIlter_value in i]
...         ) ,
...     axis=1
...     ).to_list())

>>> df

  FIlter_value                                               text                    desired_column
0       flower           This is a flower. It has amazing colors.                  This is a flower
1         tree  This is no flower. It is a tree. The tree is b...   It is a tree. The tree is broun
2          car  Flying with a car is crazy. You should drive i...        Flying with a car is crazy

CodePudding user response：

This is not excatly that you want, because I d'ont use pandas, but it works fine. I hope you'll know to adapt to your benefit.

content of resources/get_text.txt:

filter_value,text,desired_outcome
flower,This is a flower. It has amazing colors.,This is a flower.
tree,This is not a flower. It is a tree. The tree is brown.,It is a tree. The tree is brown.
car,Flying with a car is crazy. You should drive it. It smells like a flower

script:

import csv

if __name__ == '__main__':
    with open("resources\get_text.txt", 'r') as f:
        dict_reader = csv.DictReader(f)
        headers = dict_reader.fieldnames
        for item in dict_reader:
            target = item['filter_value']
            sentences = item['text'].split('. ')
            available_sentences = [_ for _ in sentences if target in _]
            output = '. '.join(available_sentences)
            if not output.endswith('.'):
                output = f"{output}."
            print(output)
            assert output == item['desired_outcome']