Home > OS >  Combine multiple rows without spaces and Merging in pandas
Combine multiple rows without spaces and Merging in pandas

Time:09-22

I am trying to generate a column that combines the values of three selected rows into a new column

col_1       col2      col3      col4     col5         key_column

Nivea       care       300       SSL     pre-ssl     NIVEACARE300SSL
Balea       milk       800       SSL     pre-ssl     BALEAMILK800SSL 

So, the key_column should contain values from col1,col2,col3 and col4

I am generating this column so that I can merge this data frame to another data frame where values are something like this

col_1       col2              col3      col4     Solution          key_column

nivea       nivea care era    300ml     SSL       1500€     NIVEANIVEACAREERA300MLSSL
balea       balea milk arc    800ml     SSL       1540€    BALEABALEAMILKARC800MLSSL 
  1. Can anyone help me in generating key_column?
  2. Is there a way where I can merge these two data frames which are almost similar and has similar values at a random position? Because I have very diversed dataframe and it is difficult to change the row values comparing and looping through the data frame to perform left join to obtain the Solution column.

PS: My intention for key_column generation was to perform fuzzy matching. But I am worried about accuracy. Because I have to extract the price and it should be accurate. Any other approach or suggestions would be helpful.

CodePudding user response:

I think you can try something like df['key_column'] = df['col_1] df['col_2'] df['col3'].astype('str') remember the .astype('str') so the values will be all strings and they'll concatenate in the new value ( otherwise you'll get a TypeError. About the merging I'd go for pd.merge but the arguments will depend on which columns you want for the new dataframe ( of course the values in the key_columns will have to be identical ).

CodePudding user response:

Probably; can use following command too for getting the key_column;

df1["key_column"] = df1.apply(lambda x : str(x["col1"])   str(x["col2"])   str(x["col3"])   str(x["col4"]),axis=1).str.upper()

And also, you have the option of merging two datasets without creating the key_column (mind the lower/upper cases);

df3 = df1.merge(df2,how="left",on=["col1","col2","col3","col4"])
  • Related