Suppose we have a PySpark dataframe df
with the following schema:
root
|-- parent: string (nullable = true)
|-- state: string (nullable = true)
Also suppose have another dataframe df_new
with the following schema:
root
|-- city: string (nullable = true)
What is the easiest way of adding the city
column from df_new
to df
?
CodePudding user response:
you can use df.insert()
to add a new column
df.insert(2,"city",df_new["city"])
given the information you provided this should suffice
CodePudding user response:
import pandas as pd
df1 = pd.DataFrame({"parent": ["a","b","c"],
"state": ["d", "f", "g"],})
df2 = pd.DataFrame({"city": ["Paris"],
})
extracted_col = df2["city"]
df2 = df1.join(extracted_col)
you can also try this answer in this post Add a column from another DataFrame