I have the following DataFrame in pandas:
code | town | village | city |
---|---|---|---|
01 | Brunete | NaN | NaN |
02 | NaN | Cabrera d'Anoia | NaN |
03 | NaN | NaN | Barcelona |
04 | Zarzalejo | NaN | Madrid |
07 | Melilla | NaN | City of Melilla |
08 | Cartagena | Galifa | Region of Murcia |
I want to merge the columns town
, city
and village
into one. In case more than 1 column contains a value other than NaN, the resulting column will be assigned the value in this order of preference: village
> town
> city
. Example:
code | merged_column |
---|---|
01 | Brunete |
02 | Cabrera d'Anoia |
03 | Barcelona |
04 | Zarzalejo |
07 | Melilla |
08 | Galifa |
CodePudding user response:
Let us define the order
of columns then select those columns and use backfill
along columns axis
order = ['village', 'town', 'city']
df['merged'] = df[order].bfill(axis=1).iloc[:, 0]
code town village city merged
0 1 Brunete NaN NaN Brunete
1 2 NaN Cabrera d'Anoia NaN Cabrera d'Anoia
2 3 NaN NaN Barcelona Barcelona
3 4 Zarzalejo NaN Madrid Zarzalejo
4 7 Melilla NaN Melilla Melilla
5 8 Cartagena Galifa Region of Murcia Galifa
CodePudding user response:
What you want is a coalesce
pandas version. Here you can find a related question to this.
An alternative method is the combine_first
method, which can be translated to the sql-coalesce
function:
df = pd.DataFrame(
{"code": ["01", "02", "03", "04", "07", "08"],
"town": ["Brunete", pd.NA, pd.NA, "Zarzalejo", "Melilla", "Cartagena"],
"village": [pd.NA, "Cabrera d'Anoia", pd.NA, pd.NA, pd.NA, "Galifa"],
"city": [pd.NA, pd.NA, "Barcelona", "Madrid", "Melilla", "Region of Murcia"]}
)
df["village"].combine_first(df["town"]).combine_first(df["city"])
----------------------------
0 Brunete
1 Cabrera d'Anoia
2 Barcelona
3 Zarzalejo
4 Melilla
5 Galifa
Name: village, dtype: object
-----------------------------