I currently have a csv imported into Jupyter lab. Pandas has been imported, the data frame is 7845 rows x 14 columns. I have two specific columns one named "source_app_packets" and the other is "source_app_packets.1". The two columns are almost identical. The main difference is any data missing from "source_app_packets" is present on "source_app_packets.1" and vice versa. My question is there any way to combine these two?
CodePudding user response:
You can use combine_first:
df["source_app_packets"].combine_first(df["source_app_packets.1"]
Example:
import pandas as pd
data = {"source_app_packets":[1, None, 3, None],"source_app_packets.1":[None,2, None, 4]}
df = pd.DataFrame(data)
df["source_app_packets"].combine_first(df["source_app_packets.1"])
Outputs the following Series
:
0 1.0
1 2.0
2 3.0
3 4.0
Name: source_app_packets, dtype: float
CodePudding user response:
If you also import numpy
you could use something like this, which assumes your data is in df
.
import numpy as np
# code to import data
# update source_app_packets column
df["source_app_packets"] = np.where(
df["source_app_packets"].isnull(),
df["source_app_packets.1"],
df["source_app_packets"],
)
df.drop(["source_app_packets.1"], axis=1, inplace=True)