Home > Enterprise >  How to extract specific no. of character from dataframe column Python
How to extract specific no. of character from dataframe column Python

Time:08-02

Like Left formula we Used in Excel same as I want to extract no. of character from Policy no. column based on Insurer column like.....

if insurer is HDFC then Extract only 10 character form the sting and if insurer is tata then Extract only 7 character form the sting.

How I will achieve this in python

Insurer Policy no. Expected OutPut
Hdfc 4509242332 4509242332
Tata tatadigitNational tatadig
Hdfc 09082323ab12sd 09082323ab
Hdfc nolanheroman nolanherom
Tata 97543007356 9754300
Tata pqrsequence2o202 pqrsequ
Tata 987654321 9876543

CodePudding user response:

What you can do is define a new function that does the comparison and apply that to a new column (Expected OutPut in your example).

def f(row): 
    val = str(row['Policy no.'])
    return val[:10] if row['Insurer'] == "Hdfc" else val[:7]

df['Expected OutPut'] = df.apply(f, axis=1)

CodePudding user response:

You can try np.select

df['out'] = np.select(
    [df['Insurer'].str.lower().eq('hdfc'),
     df['Insurer'].str.lower().eq('tata')],
    [df['Policy no.'].str[:10],
     df['Policy no.'].str[:7],],
    df['Policy no.']
    )
print(df)

  Insurer         Policy no. Expected OutPut         out
0    Hdfc         4509242332      4509242332  4509242332
1    Tata  tatadigitNational         tatadig     tatadig
2    Hdfc     09082323ab12sd      09082323ab  09082323ab
3    Hdfc       nolanheroman      nolanherom  nolanherom
4    Tata        97543007356         9754300     9754300
5    Tata   pqrsequence2o202         pqrsequ     pqrsequ
6    Tata          987654321         9876543     9876543

CodePudding user response:

One possible solution is,

df['temp'] = df['Insurer'].map({'Hdfc':10, 'Tata':7})
df['Expected Output'] = df.apply(lambda x: x['Policy no.'][:x['temp']], axis=1)

O/P:

  Insurer         Policy no. Expected Output  temp
0    Hdfc         4509242332      4509242332    10
1    Tata  tatadigitNational         tatadig     7
2    Hdfc     09082323ab12sd      09082323ab    10
3    Hdfc       nolanheroman      nolanherom    10
4    Tata        97543007356         9754300     7
5    Tata   pqrsequence2o202         pqrsequ     7
6    Tata          987654321         9876543     7

CodePudding user response:

Another solution:

df['Expected OutPut'] = df.apply(lambda x: x['Policy no.'][0:10] if x['Insurer']=='Hdfc' else x['Policy no.'][0:7], axis = 1)
print(df)

Insurer         Policy no. Expected OutPut
0    Hdfc         4509242332      4509242332
1    Tata  tatadigitNational         tatadig
2    Hdfc     09082323ab12sd      09082323ab
3    Hdfc       nolanheroman      nolanherom
4    Tata        97543007356         9754300
5    Tata   pqrsequence2o202         pqrsequ
6    Tata          987654321         9876543

CodePudding user response:

You could try this


df['Expected Output'] = np.where(df['Insurer']== 'Hdfc', df["Policy no"].str[:10],df["Policy no"].str[:7])

CodePudding user response:

A vector solution using pandas slicing

df['Expected Output'] = df['Policy no.'].str[:10]
df.loc[df.index[df.Insurer.eq('Tata')], 'Expected Output'] = df['Expected Output'].loc[df.index[df.Insurer.eq('Tata')]].str[:7]

which gives us the expected output :

df
  
  Insurer         Policy no. Expected Output
0    Hdfc         4509242332      4509242332
1    Tata  tatadigitNational         tatadig
2    Hdfc     09082323ab12sd      09082323ab
3    Hdfc       nolanheroman      nolanherom
4    Tata        97543007356         9754300
5    Tata   pqrsequence2o202         pqrsequ
6    Tata          987654321         9876543

CodePudding user response:

You can easily do this by converting the dataframe to an array, then iterating through every row

array = df.to_numpy()

for row in array:
    #assuming you only have 2 columns, check to see if the insurer is Tata
    if row[0] == 'Tata':
        #slice string in the Policy column
        row[1] = row[1][:7]
#now, convert array back to df
pd.Dataframe(array, columns=['Insurer','Policy no.'])
    
  • Related