Home > Software engineering >  Filtering columns in pandas by length
Filtering columns in pandas by length

Time:05-08

I have a column in a dataframe that contains IATA_Codes (Abbreviations) for Airports (such as: LAX, SFO, ...) However, if I analyze the column values a little more (column.unique()), it says that there are also 4 digit numbers in it. How can I filter the column so that my Datafram will only consist of rows containing a real Airport code?

My idea was to filter the length (Airports Code Length is always 3, while the Number length is always 4) but I don't know how to implement this idea.

array(['LFT', 'HYS', 'ELP', 'DVL', 'ISP', 'BUR', 'DAB', 'DAY', 'GRK',
       'GJT', 'BMI', 'LBE', 'ASE', 'RKS', 'GUM', 'TVC', 'ALO', 'IMT',
...
       10170, 11577, 14709, 14711, 12255, 10165, 10918, 15401, 13970,
       15497, 12265, 14254, 10581, 12016, 11503, 13459, 14222, 14025,
       '10333', '14222', '14025', '13502', '15497', '12265'], dtype=object)

CodePudding user response:

You can use df.columns.str.len to get the length, and pass that to the second indexer position of df.loc:

df = df.loc[:, df.columns.astype(str).str.len() == 3]

CodePudding user response:

one another possibility is to use lambda expression :

df[df['IATA_Codes'].apply(lambda x : len(str(x))==3)]['IATA_Codes'].unique()
  • Related