I am trying to generate a column that combines the values of three selected rows into a new column
col_1 col2 col3 col4 col5 key_column
Nivea care 300 SSL pre-ssl NIVEACARE300SSL
Balea milk 800 SSL pre-ssl BALEAMILK800SSL
So, the key_column should contain values from col1,col2,col3 and col4
I am generating this column so that I can merge this data frame to another data frame where values are something like this
col_1 col2 col3 col4 Solution key_column
nivea nivea care era 300ml SSL 1500€ NIVEANIVEACAREERA300MLSSL
balea balea milk arc 800ml SSL 1540€ BALEABALEAMILKARC800MLSSL
- Can anyone help me in generating key_column?
- Is there a way where I can merge these two data frames which are almost similar and has similar values at a random position? Because I have very diversed dataframe and it is difficult to change the row values comparing and looping through the data frame to perform left join to obtain the Solution column.
PS: My intention for key_column generation was to perform fuzzy matching. But I am worried about accuracy. Because I have to extract the price and it should be accurate. Any other approach or suggestions would be helpful.
CodePudding user response:
I think you can try something like df['key_column'] = df['col_1] df['col_2'] df['col3'].astype('str') remember the .astype('str') so the values will be all strings and they'll concatenate in the new value ( otherwise you'll get a TypeError. About the merging I'd go for pd.merge but the arguments will depend on which columns you want for the new dataframe ( of course the values in the key_columns will have to be identical ).
CodePudding user response:
Probably; can use following command too for getting the key_column;
df1["key_column"] = df1.apply(lambda x : str(x["col1"]) str(x["col2"]) str(x["col3"]) str(x["col4"]),axis=1).str.upper()
And also, you have the option of merging two datasets without creating the key_column (mind the lower/upper cases);
df3 = df1.merge(df2,how="left",on=["col1","col2","col3","col4"])