Home > Software design >  Concatenate resulting strings from if statements in Python Pandas
Concatenate resulting strings from if statements in Python Pandas

Time:11-04

I'm very new to pandas, python and google colab, and I just passed more than 6 hours trying to find a way to do a formula that I did in 2 minutes in google sheets.

I want to concatenate the results from if statements in a single column, like what I did here in the column "Things to fix", where if there is a "Yes" in the column, the problem will appear so that the worker can check what needs to be done. enter image description here

In both excel and sheets I can just join if statements using "&", but whenever I try this with with pandas joins, some kind of error pops up. I also tried using this format of code:

my_list = ['a', 'b', 'c', 'd']
my_string = ','.join(my_list)
# Output = 'a,b,c,d'

but it kind of pivoted the data and messed everything around.

I'm working in a google colab environment, on a .ipynb file.

Thank you very much for the attention and help.

CodePudding user response:

Why not this?

df['tofix'] = df.apply(lambda r: ','.join([r.key*'key',r.win*'win',r.eng*'eng'])

CodePudding user response:

Use melt:

out = df.melt(id_vars=['Car Id'], var_name='Things to fix', ignore_index=False) \
        .query("value == 'Yes'").groupby('Car Id')['Things to fix'] \
        .apply(lambda x: ','.join(x.str.extract(r'(\w )\?', expand=False)))

out = df.merge(out, on='Car Id', how='left')

Output:

>>> out
   Car Id Problem with Key? Problem with windows? Problem in the engine?       Things to fix
0    1000               Yes                   Yes                    Yes  Key,windows,engine
1    1001                No                   Yes                     No             windows
2    1002                No                    No                     No                 NaN

Setup:

data = {'Car Id': [1000, 1001, 1002],
        'Problem with Key?': ['Yes', 'No', 'No'],
        'Problem with windows?': ['Yes', 'Yes', 'No'],
        'Problem in the engine?': ['Yes', 'No', 'No']}

df = pd.DataFrame(data)
  • Related