Column to edit | Value to insert | Rule |
---|---|---|
protein.carbs.fats | banana | healthy |
protein.carbs.fats | chips | unhealthy |
I have the above data frame.
I need to scan every row in the data frame and insert in 'Column to edit' the value from 'Value to insert' based on the rule and also edit some of the elements.
I could probably convert the 'Column to edit' into series and edit it with basic python, but I'm expecting a data frame with over 10k rows, so I want to use pandas to do the editing.
I'm looking to have it in the below state:
Column to edit | Value to insert | Rule |
---|---|---|
protein.carbs.banana.fats | banana | healthy |
protein.needHealthy.chips.needHealthy | chips | unhealthy |
I tried using apply, join and loc methods to no prevail.
CodePudding user response:
You could create a function that takes each row at a time using apply:
def update_col_to_edit(row):
str_spl = row['column to edit'].split('.')
if row['Rule'] == 'healthy':
return f'{str_spl[0]}.{str_spl[1]}.{row['value to insert']}.{str_spl[2]}'
else:
return f'{str_spl[0]}.needHealthy.{row['value to insert']}.needHealthy'
df = # your dataframe
df['col_to_edit'] = df['Column to edit'].apply(lambda row: update_col_to_edit(row))
Here we go row by row and can use column names to check specific values of each row. We check if the row is healthy or not. if it's healthy, we use a formatted string (f string) to put the original text in order while also inserting the "value to insert" where it needs to go. If it's unhealthy, then we insert needHealthy and "value to insert" column value where it needs to go.
this is a very specific solution, so you would have to provide more data for us to know how to generalize better