AttributeError: 'numpy.ndarray' object has no attribute 'str'-CodePudding

I'm working with pandas for the first time and not sure if what I'm attempting is the best way to accomplish this, but I'm parsing an Excel sheet, trying to make a list of every number in a certain column (column 'C' in my case).

I returned everything from column 'C', removed the empty cells and now I'm trying to remove the '#' character that's in front of the numbers.

Here's my code from before I tried to remove the '#' character, so you can see the output:

def get_prod_cmvp_data():
    prod_cmvp_data = pd.read_excel('_Request ID__Client Cryptographic Module List Template.xlsx', usecols = 'C', header = 6, )
    prod_cmvp = prod_cmvp_data.dropna()
    vals = prod_cmvp.values
    print(vals)

output:

[['#3914']
 [' #3907']
 [' #3197']
 ['#4272']
 ['#4271']
 ['#4254']
 ['#3784']
 ['#3946']
 ['#3888']
 ['#4174']
 ['#4222']
 ['#3613']
 ['#3125']
 ['#3140']
 ['#3197']
 [' #3196']
 [' #3644']
 [' #3615']
 ['#3651']
 ['#3918']
 ['#3946']
 ['#4271']
 ['#3888']
 ['#4174']
 ['#4222']
 ['#3613']
 ['#3125']
 ['#3140']]

Here's my code after I tried to remove the '#' character

def get_prod_cmvp_data():
    prod_cmvp_data = pd.read_excel('_Request ID__Client Cryptographic Module List Template.xlsx', usecols = 'C', header = 6, )
    prod_cmvp = prod_cmvp_data.dropna()
    vals = prod_cmvp.values
    values = vals.str.replace("#", "")
    print(values)

Output:

AttributeError: 'numpy.ndarray' object has no attribute 'str'

Excel File:

CodePudding user response：

.values returns a ndarray so won't work with .str.replace.

You should be able to drop

vals = prod_cmvp.values

And use pandas string methods on the pandas series object instead:

prod_cmvp = prod_cmvp_data.dropna()["C"].str.replace("#", "")

Caveats: From what you posted it's not easy to reproduce your input data. The solution above assumes column C holds an array of strings when read with pd. read_excel. Also, it's assumed it's called "C" otherwise replace the string '["C"]' with your correct column name. If you need to see the column names:

prod_cmvp_data.columns

CodePudding user response：

please provide excel file, it can done elegantly while reading from file. solution to current problem as below.

col=[['#3914'],
[' #3907'],
[' #3197'],
['#4272'],
['#4271'],
['#4254'],
['#3784'],
['#3946'],
['#3888'],
['#4174'],
['#4222'],
['#3613'],
['#3125'],
['#3140'],
['#3197'],
[' #3196'],
[' #3644'],
[' #3615'],
['#3651'],
['#3918'],
['#3946'],
['#4271'],
['#3888'],
['#4174'],
['#4222'],
['#3613'],
['#3125'],
['#3140']]

for i in col:
    i[0]=i[0].replace('#', '')