Home > other >  Use Values in column as String Slicer of another column pandas
Use Values in column as String Slicer of another column pandas

Time:07-06

I'm trying to use a cell value as the slice for a string in a new column. For example, if I create this table.

data = pd.DataFrame(data = {'Name':['This is a title'], 'Number':[-5]})

               Name Number
0   This is a title     -5

And create a new column like so:

data['Test'] = data.Name.str[:data.Number.item()]

It'll create the new column, as expected:

               Name Number       Test
0   This is a title     -5  This is a 

The issue occurs when I have more than row, so if I create the following table:

 data = pd.DataFrame(data = {'Name':['This is a title', 'This is another title'], 'Number':[-5, -13]})

                     Name   Number
0   This is a title             -5
1   This is another title      -13

The creation of the 'Test' column yields:

can only convert an array of size 1 to a Python scalar

I understand why this is happening since the column now has more than one value, what I want to know is how can I do this with a dataframe that has more than one row? I've tried .items(), .values(), etc. and the new column just becomes NaN.

Any thoughts?

Thanks!

CodePudding user response:

You can use apply with axis=1 and move on dataframe row by row.

import pandas as pd
data = pd.DataFrame(data = {'Name':['This is a title', 'This is another title'], 'Number':[-5, -13]})

data['Test'] = data.apply(lambda row: row['Name'][:row['Number']], axis=1)
print(data)

Output:

                    Name  Number        Test
0        This is a title      -5  This is a 
1  This is another title     -13    This is 

CodePudding user response:

Unfortunately, here, you need to loop. A list comprehension will be the most efficient:

data['Test'] = [s[:i] for s,i in zip(data['Name'], data['Number'])]

output:

                    Name  Number        Test
0        This is a title      -5  This is a 
1  This is another title     -13    This is 
  • Related