Home > Back-end >  AttributeError: 'numpy.ndarray' object has no attribute 'str'
AttributeError: 'numpy.ndarray' object has no attribute 'str'

Time:09-06

I'm working with pandas for the first time and not sure if what I'm attempting is the best way to accomplish this, but I'm parsing an Excel sheet, trying to make a list of every number in a certain column (column 'C' in my case).

I returned everything from column 'C', removed the empty cells and now I'm trying to remove the '#' character that's in front of the numbers.

Here's my code from before I tried to remove the '#' character, so you can see the output:

def get_prod_cmvp_data():
    prod_cmvp_data = pd.read_excel('_Request ID__Client Cryptographic Module List Template.xlsx', usecols = 'C', header = 6, )
    prod_cmvp = prod_cmvp_data.dropna()
    vals = prod_cmvp.values
    print(vals)

output:

[['#3914']
 [' #3907']
 [' #3197']
 ['#4272']
 ['#4271']
 ['#4254']
 ['#3784']
 ['#3946']
 ['#3888']
 ['#4174']
 ['#4222']
 ['#3613']
 ['#3125']
 ['#3140']
 ['#3197']
 [' #3196']
 [' #3644']
 [' #3615']
 ['#3651']
 ['#3918']
 ['#3946']
 ['#4271']
 ['#3888']
 ['#4174']
 ['#4222']
 ['#3613']
 ['#3125']
 ['#3140']]

Here's my code after I tried to remove the '#' character

def get_prod_cmvp_data():
    prod_cmvp_data = pd.read_excel('_Request ID__Client Cryptographic Module List Template.xlsx', usecols = 'C', header = 6, )
    prod_cmvp = prod_cmvp_data.dropna()
    vals = prod_cmvp.values
    values = vals.str.replace("#", "")
    print(values)

Output:

AttributeError: 'numpy.ndarray' object has no attribute 'str'

Excel File:

enter image description here

CodePudding user response:

.values returns a ndarray so won't work with .str.replace.

You should be able to drop

vals = prod_cmvp.values

And use pandas string methods on the pandas series object instead:

prod_cmvp = prod_cmvp_data.dropna()["C"].str.replace("#", "")

Caveats: From what you posted it's not easy to reproduce your input data. The solution above assumes column C holds an array of strings when read with pd. read_excel. Also, it's assumed it's called "C" otherwise replace the string '["C"]' with your correct column name. If you need to see the column names:

prod_cmvp_data.columns

CodePudding user response:

please provide excel file, it can done elegantly while reading from file. solution to current problem as below.

col=[['#3914'],
[' #3907'],
[' #3197'],
['#4272'],
['#4271'],
['#4254'],
['#3784'],
['#3946'],
['#3888'],
['#4174'],
['#4222'],
['#3613'],
['#3125'],
['#3140'],
['#3197'],
[' #3196'],
[' #3644'],
[' #3615'],
['#3651'],
['#3918'],
['#3946'],
['#4271'],
['#3888'],
['#4174'],
['#4222'],
['#3613'],
['#3125'],
['#3140']]

for i in col:
    i[0]=i[0].replace('#', '')
  • Related