My dataframe lists some of the name I interested and may make some changes, what I want is keep the original name if option is blank, delete the name if it says delete and replace the name if it has something.
name | option |
---|---|
A | |
B | delete |
C | |
D | DDDD |
E | delete |
F | FF |
The final dataframe I want is to change the names in a much more big dataframe and with other infos. may look like this
name | info1 | info2 |
---|---|---|
A | xxxx | sth |
A | xxxx | sth |
C | x | blala |
DDDD | xx | lalalaa |
DDDD | xx | laa |
DDDD | xx | laaaa |
FF | xxx | aaaa |
CodePudding user response:
For good performance first remove unnecessary rows and then set values by condition:
df1 = df[~df['option'].str.contains('delete')]
df1.loc[df['option'].ne('') | df['option'].notnull(), 'name'] = ''
Or:
df1['name'] = np.where(df['option'].ne('') | df['option'].notnull(), '', df1['name'])
CodePudding user response:
You can use case condition to replace based on option value.
val df = Seq(("A", ""), ("B", "delete"), ("C", ""), ("D", "DDDD"), ("E", "delete"), ("F", "FF")).toDF("name", "option")
df.createOrReplaceTempView("temp")
val query = """
select case when option = 'delete' then ''
when option == '' then name
else option
end as name,
option
from temp
"""
spark.sql(query).show()
---- ------
|name|option|
---- ------
| A| |
| |delete|
| C| |
|DDDD| DDDD|
| |delete|
| FF| FF|
---- ------