Efficient way to conditionally "replace" elements in list of lists based on corresponding-CodePudding

I have a pandas dataframe with a structure like:

df =

repl_str	normal_str
1_labelled	1_text
2_labelled	2_text
4_labelled	4_text
5_labelled	5_text
7_labelled	7_text
8_labelled	8_text

And a list of lists where some of the strings in df["normal_str"] are present, but not necessarily all, like:

A = [[1_text, 3_text, 4_text], [5_text], [6_text, 8_text]]

I want to create a new list of lists B, where the string elements present in df and A are exchanged for the corresponding string in the "labelled_str" column of df. The strings in A which are not present in df["normal_str"] should be left as is.

So in this case: B = [[1_labelled, 3_text, 4_labelled], [5_labelled], [6_text, 8_labelled]].

In the actual list of lists (instead of this mock example), the inner lists greatly vary in length. I have a working solution using list comprehension, but it takes a long time to run:

[[[str_val for str_val in df['repl_str'].where(df['normal_str']==y).tolist() if str_val==str_val][0] 
       if [str_val for str_val in df['repl_str'].where(df['normal_str']==y).tolist() if str_val == str_val] 
       else y for y in x] for x in A]

Does anyone know a quicker way?

CodePudding user response：

If values in normal_str column are all unique, you can create a dictionary that maps normal_str column to repl_str column

A = [['1_text', '3_text', '4_text'], ['5_text'], ['6_text', '8_text']]

d = df.set_index(['normal_str'])['repl_str'].to_dict()
B = [[d.get(text, text) for text in lst] for lst in A]

print(B)

[['1_labelled', '3_text', '4_labelled'], ['5_labelled'], ['6_text', '8_labelled']]