Home > Mobile >  Efficient way to conditionally "replace" elements in list of lists based on corresponding
Efficient way to conditionally "replace" elements in list of lists based on corresponding

Time:06-07

I have a pandas dataframe with a structure like:

df =

repl_str normal_str
1_labelled 1_text
2_labelled 2_text
4_labelled 4_text
5_labelled 5_text
7_labelled 7_text
8_labelled 8_text

And a list of lists where some of the strings in df["normal_str"] are present, but not necessarily all, like:

A = [[1_text, 3_text, 4_text], [5_text], [6_text, 8_text]]

I want to create a new list of lists B, where the string elements present in df and A are exchanged for the corresponding string in the "labelled_str" column of df. The strings in A which are not present in df["normal_str"] should be left as is.

So in this case: B = [[1_labelled, 3_text, 4_labelled], [5_labelled], [6_text, 8_labelled]].

In the actual list of lists (instead of this mock example), the inner lists greatly vary in length. I have a working solution using list comprehension, but it takes a long time to run:

[[[str_val for str_val in df['repl_str'].where(df['normal_str']==y).tolist() if str_val==str_val][0] 
       if [str_val for str_val in df['repl_str'].where(df['normal_str']==y).tolist() if str_val == str_val] 
       else y for y in x] for x in A]

Does anyone know a quicker way?

CodePudding user response:

If values in normal_str column are all unique, you can create a dictionary that maps normal_str column to repl_str column

A = [['1_text', '3_text', '4_text'], ['5_text'], ['6_text', '8_text']]

d = df.set_index(['normal_str'])['repl_str'].to_dict()
B = [[d.get(text, text) for text in lst] for lst in A]
print(B)

[['1_labelled', '3_text', '4_labelled'], ['5_labelled'], ['6_text', '8_labelled']]
  • Related