Combining two columns using pandas-CodePudding

I currently have a csv imported into Jupyter lab. Pandas has been imported, the data frame is 7845 rows x 14 columns. I have two specific columns one named "source_app_packets" and the other is "source_app_packets.1". The two columns are almost identical. The main difference is any data missing from "source_app_packets" is present on "source_app_packets.1" and vice versa. My question is there any way to combine these two?

CodePudding user response：

You can use combine_first:

df["source_app_packets"].combine_first(df["source_app_packets.1"]

Example:

import pandas as pd

data = {"source_app_packets":[1, None, 3, None],"source_app_packets.1":[None,2, None, 4]}
df = pd.DataFrame(data)
df["source_app_packets"].combine_first(df["source_app_packets.1"])

Outputs the following Series:

0    1.0
1    2.0
2    3.0
3    4.0
Name: source_app_packets, dtype: float

CodePudding user response：

If you also import numpy you could use something like this, which assumes your data is in df.

import numpy as np

# code to import data

# update source_app_packets column
df["source_app_packets"] = np.where(
    df["source_app_packets"].isnull(),
    df["source_app_packets.1"],
    df["source_app_packets"],
)

df.drop(["source_app_packets.1"], axis=1, inplace=True)