I'm consuming an API and some column names are too big for mysql database.
How to ignore field in dataframe?
I was trying this:
import pandas as pd
import numpy as np
lst =['Java', 'Python', 'C', 'C ','JavaScript', 'Swift', 'Go']
df = pd.DataFrame(lst)
limit = 7
for column in df.columns:
if (pd.to_numeric(df[column].str.len())) > limit:
df -= df[column]
print (df)
result:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
My preference is to delete the column that is longer than my database supports.
But I tried slice to change the name and it didn't work either.
I appreciate any help
CodePudding user response:
Suppose the following dataframe
>>> df
col1 col2 col3 col4
0 5uqukp g7eLDgm0vrbV Bnssm tRJnSQma6E
1 NDsApz lu02dO ogbRz5 481riI6qne
2 UEfni YV2pCXYFbd pyHYqDH fghpTgItm
3 a0PvRSv 0FwxzFqk jUHQliB W2dBhH
4 BQgTFp FMseKnR ifgt tw1j7Ld
5 1vvF2Hv cwTyt2GtpC4 P039m2 1qR2slCmu
6 JYnABTr oLdZVz KYBspk RgsCsu
To remove columns where at least one value have a length greater than 7 characters, use:
>>> df.loc[:, df.apply(lambda x: x.str.len().max() <= 7)]
col1 col3
0 5uqukp Bnssm
1 NDsApz ogbRz5
2 UEfni pyHYqDH
3 a0PvRSv jUHQliB
4 BQgTFp ifgt
5 1vvF2Hv P039m2
6 JYnABTr KYBspk
To understand the error, read this post
CodePudding user response:
As I mentioned in my comment, when you do df = pd.DataFrame(lst)
you are saying to create a dataframe with a single column where the rows are populated by your single-dimension list. So iterating through columns of the dataframe isn't doing anything as there is only a single column
That being said, this is an advantage as you can use a set based approach to answer your question:
import pandas as pd
import numpy as np
lst =['Java', 'Python', 'C', 'C ','JavaScript', 'Swift', 'Go']
df = pd.DataFrame(lst)
limit = 7
print(df[df[0].str.len() > limit])
That will spit out a dataframe with a single column and a single row containing "Javascript" the only value that is over your character length limit. If you wanted to keep the values that are under the limit just change that >
to <=
.