Problems lowercasing each word in a pandas dataframe column with lists of strings-CodePudding

As the title says, I'm trying to lowercase each element in a list of strings on a dataframe column.

Example of what I have:

df
   A
0  [Verapamil hydrochloride]  
1  [Simvastatin]  
2  [Sulfamethoxazole, Trimethoprim]

Example of what I want to have:

df
   A
0  [verapamil hydrochloride]  
1  [simvastatin]  
2  [sulfamethoxazole, trimethoprim]

I tried using:

df['A'].apply(lambda x: [w.lower() for w in x])

but it outputs: TypeError: 'float' object is not iterable

When checking individually it does not identify any floats

type(df['A'][0])
#Out: list

type(df['A'][0][0])
#Out: str

I'm doing this because I want to compare lists later using set(), because not only the elements in the other lists can have the strings in lowercase, but can also change the order within the lists.

I don't really know what to do, because I can't find the reasons for that error. Is there an alternative?

CodePudding user response：

import pandas as pd
df = pd.read_csv('DCI.csv')
df['ActiveSubstances'] = df['ActiveSubstances'].astype(str)
df['ActiveSubstances'] = df.apply(lambda row: row['ActiveSubstances'].lower(), axis=1)
print(df)

Output

                            ActiveSubstances
0                ['verapamil hydrochloride']
1                ['verapamil hydrochloride']
2                ['verapamil hydrochloride']
3                            ['simvastatin']
4                            ['simvastatin']
...                                      ...
192520             ['doxepin hydrochloride']
192521             ['doxepin hydrochloride']
192522                      ['ethosuximide']
192523           ['fludrocortisone acetate']
192524  ['sulfamethoxazole', 'trimethoprim']

[192525 rows x 1 columns]

Converting to str and then applying lower() solves it.

CodePudding user response：

You can use:

variable.lowercase()