As the title says, I'm trying to lowercase each element in a list of strings on a dataframe column.
Example of what I have:
df
A
0 [Verapamil hydrochloride]
1 [Simvastatin]
2 [Sulfamethoxazole, Trimethoprim]
Example of what I want to have:
df
A
0 [verapamil hydrochloride]
1 [simvastatin]
2 [sulfamethoxazole, trimethoprim]
I tried using:
df['A'].apply(lambda x: [w.lower() for w in x])
but it outputs:
TypeError: 'float' object is not iterable
When checking individually it does not identify any floats
type(df['A'][0])
#Out: list
type(df['A'][0][0])
#Out: str
I'm doing this because I want to compare lists later using set()
, because not only the elements in the other lists can have the strings
in lowercase, but can also change the order within the lists.
I don't really know what to do, because I can't find the reasons for that error. Is there an alternative?
CodePudding user response:
import pandas as pd
df = pd.read_csv('DCI.csv')
df['ActiveSubstances'] = df['ActiveSubstances'].astype(str)
df['ActiveSubstances'] = df.apply(lambda row: row['ActiveSubstances'].lower(), axis=1)
print(df)
Output
ActiveSubstances
0 ['verapamil hydrochloride']
1 ['verapamil hydrochloride']
2 ['verapamil hydrochloride']
3 ['simvastatin']
4 ['simvastatin']
... ...
192520 ['doxepin hydrochloride']
192521 ['doxepin hydrochloride']
192522 ['ethosuximide']
192523 ['fludrocortisone acetate']
192524 ['sulfamethoxazole', 'trimethoprim']
[192525 rows x 1 columns]
Converting to str
and then applying lower()
solves it.
CodePudding user response:
You can use:
variable.lowercase()