Home > Software engineering >  Pandas: Compare dataframe with csv file and update the csv file
Pandas: Compare dataframe with csv file and update the csv file

Time:12-15

I have below ddataframe

          domain   hostname      ip address  status           managed by      monitor target name
0         jetty.com  aabb   XXX.XX.XX.XXX         Deployed  Sec Team  aabb.jetty.com
1         jetty.com  axcd   XXX.XX.XX.XXX         Deployed  Sec Team  axcd.jetty.com

Now I have another csv file reporst.csv

"domain","hostname","ip address","managed by", "monitor target name"
"jetty.com","aabb","XXX.XX.XX.XXX", "Decreased","OP Team", "aabb.jetty.com"
"jetty.com","axcd","XXX.XX.XX.XXX", "Decreased","OP2 Team", "axcd.jetty.com"
"jetty.com","appd","XXX.XX.XX.XXX", "Dec","OP8 Team", "appd.jetty.com"

 

Now I want to compare the df to this csv and then update the csv to the below

"domain","hostname","ip address","managed by", "monitor target name"
"jetty.com","aabb","XXX.XX.XX.XXX", "Deployed","OP Team", "aabb.jetty.com"
"jetty.com","axcd","XXX.XX.XX.XXX", "Deployed","OP Team", "axcd.jetty.com"
"jetty.com","appd","XXX.XX.XX.XXX", "Dec","OP8 Team", "appd.jetty.com"

Now it is updated and look same as it is in the df

How can I update the csv according to the df value but not tough the other rows in the csv which are there already

I have the code

     newdf = pd.DataFrame.from_dict(hh, orient="index")
     exdf= pd.read_csv("./reports.csv")
     df1=newdf.set_index(['hostname'])
     df2=exdf.set_index(['hostname'])
     df2.update(df1)

getting error: KeyError: "None of ['hostname'] are in the columns"

CodePudding user response:

Use DataFrame.update with specified columns in both DataFrames for match and create index from them:

df2 = pd.read_csv('reporst.csv')

df1 = ddataframe.set_index(['domain','hostname','ip address'])
df2 = reporst.set_index(['domain','hostname','ip address'])

df2.update(df1)

df2.to_csv('out.csv')

If is not specified column for match and need match by positions, it means first row from first DataFrame by first row from another one, same logic for all rows use:

newdf = pd.DataFrame.from_dict(hh, orient="index")
exdf= pd.read_csv("./reports.csv")

exdf.update(newdf)
  • Related