Home > Mobile >  Number of characters in the column name of a DataFrame
Number of characters in the column name of a DataFrame

Time:02-24

I'm consuming an API and some column names are too big for mysql database.

How to ignore field in dataframe?

I was trying this:

import pandas as pd
import numpy as np

lst =['Java', 'Python', 'C', 'C  ','JavaScript', 'Swift', 'Go'] 

df = pd.DataFrame(lst)
limit = 7

for column in df.columns:
   if (pd.to_numeric(df[column].str.len())) > limit:
        df -= df[column]
        print (df)

result:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

My preference is to delete the column that is longer than my database supports.

But I tried slice to change the name and it didn't work either.

I appreciate any help

CodePudding user response:

Suppose the following dataframe

>>> df
      col1          col2      col3        col4
0   5uqukp  g7eLDgm0vrbV     Bnssm  tRJnSQma6E
1   NDsApz        lu02dO    ogbRz5  481riI6qne
2    UEfni    YV2pCXYFbd   pyHYqDH   fghpTgItm
3  a0PvRSv      0FwxzFqk   jUHQliB      W2dBhH
4   BQgTFp       FMseKnR      ifgt     tw1j7Ld
5  1vvF2Hv   cwTyt2GtpC4    P039m2   1qR2slCmu
6  JYnABTr        oLdZVz    KYBspk      RgsCsu

To remove columns where at least one value have a length greater than 7 characters, use:

>>> df.loc[:, df.apply(lambda x: x.str.len().max() <= 7)]
      col1     col3
0   5uqukp    Bnssm
1   NDsApz   ogbRz5
2    UEfni  pyHYqDH
3  a0PvRSv  jUHQliB
4   BQgTFp     ifgt
5  1vvF2Hv   P039m2
6  JYnABTr   KYBspk

To understand the error, read this post

CodePudding user response:

As I mentioned in my comment, when you do df = pd.DataFrame(lst) you are saying to create a dataframe with a single column where the rows are populated by your single-dimension list. So iterating through columns of the dataframe isn't doing anything as there is only a single column

That being said, this is an advantage as you can use a set based approach to answer your question:

import pandas as pd
import numpy as np

lst =['Java', 'Python', 'C', 'C  ','JavaScript', 'Swift', 'Go'] 

df = pd.DataFrame(lst)

limit = 7
print(df[df[0].str.len() > limit])

That will spit out a dataframe with a single column and a single row containing "Javascript" the only value that is over your character length limit. If you wanted to keep the values that are under the limit just change that > to <=.

  • Related