Home > database >  How to get the unique string values in a column with numbers and characters in python
How to get the unique string values in a column with numbers and characters in python

Time:10-21

I want to print the unique values taken in this column and not the numerical ones. I only want to output the values taken before the special characters (when there is one) and I don't want the second part of the string. For example for the row "lala :59 lzenvke" I don't want to take into account "lzenvke" but only "lala"

import pandas as pd



data1 = {
    'column_with_names': ['lala :56 javcejhv', 'lala56 : javcejhv' 'li :lo 7TUF', 'lo','lala :59 lzenvke','la','lala','lalalo'],

}

df1 = pd.DataFrame(data1)

print(df1)

the expected output would be:

enter image description here

CodePudding user response:

here is one way about it

Assumption: rows that don't have : are also included in the result set

import numpy as np

# split the values on colon (:), limited to 1 split, and form list (with expand)
# take the first element
# find unique using np.unique
# finally create a DF


pd.DataFrame(np.unique(df['column_with_names'].str.split(r'[\s|:]', 1, expand=True)[0]))
    0
0   la
1   lala
2   lala
3   lalalo
4   li
5   lo

if you only need to consider the rows with the colon in it

# same as above, except filter out the rows with colon beforehand
(pd.DataFrame(
    np.unique(df.loc[df['column_with_names'].str.contains(':')]['column_with_names']
              .str.split('[\s|:]', 1, expand=True)[0])))
    0
0   lala
1   li
  • Related