I have two csv files out of which at least one column has some matching cells in both. I know this because I have filtered out certain rows already. What I would like to do is add the column that csv #1 does not have from csv #2, but not all of the cells in that column, only the cells that match the ones in csv #1.
I feel like I'm not making any sense at all, which is why I have no idea how to start google on this. I will add an image explaining what I'm trying to achieve.
To the left in the image are the rows I've filtered out which have coordinates within a certain threshold. I would like the add the area column from the csv on the right hand side, but only the cells which are matching the eg. "FASTIGHET" column.
I just want to be clear here. I am not actually looking for help to right the code here. At least not yet. I am trying to figure out how to google my way there! :-D
Any nudges would be appreciated!
Cheers!
https://i.stack.imgur.com/BYvdb.png
CodePudding user response:
It looks like your "OBJEKT" column is like an ID number.
You can use the value from this column to see where it matches a row in CSV 2. Then add whatever values from that row that you want to your first dataframe.
https://www.geeksforgeeks.org/adding-new-column-to-existing-dataframe-in-pandas/
CodePudding user response:
Figured it out. Quite a short and simple snippet:
import pandas as pd
df1 = pd.DataFrame(pd.read_csv("54.csv"))
df2 = pd.DataFrame(pd.read_csv("area.csv"))
mergedStuff = pd.merge(df1, df2, on=['OBJEKT_ID'], how='inner')
mergedStuff.head()
mergedStuff.to_csv('current.csv')