Home > Mobile >  Adding a column from another dataframe to existing dataframe
Adding a column from another dataframe to existing dataframe

Time:04-01

Suppose we have a PySpark dataframe df with the following schema:

root
 |-- parent: string (nullable = true)
 |-- state: string (nullable = true)

Also suppose have another dataframe df_new with the following schema:

root
 |-- city: string (nullable = true)

What is the easiest way of adding the city column from df_new to df?

CodePudding user response:

you can use df.insert() to add a new column

df.insert(2,"city",df_new["city"])

given the information you provided this should suffice

CodePudding user response:

import pandas as pd         
df1 = pd.DataFrame({"parent": ["a","b","c"], 
                    "state": ["d", "f", "g"],})   
df2 = pd.DataFrame({"city": ["Paris"],
                    })     
extracted_col = df2["city"]
df2 = df1.join(extracted_col)

you can also try this answer in this post Add a column from another DataFrame

  • Related