I'm trying to teach myself Python for data analysis and to practice I'm working on analyzing a CSV file containing 27k survey responses.
The responses take the form of "# - Rating" (For example: "10 - Extremely Interested")
Would anyone be able to tell me how would I go about removing everything but the numerical value so that I could plot this data with matplotlib?
Thank you :)
Edit: My apologies, here is the code for the df I'm working with:
likely_recc = pd.read_csv('test_data.csv', usecols = (['How likely are you to recommend this product to a friend?']))
CodePudding user response:
This should do the trick
df['col'].str.split(' ').str[0].astype(int)
Take a look at the pandas docs on string methods for more information.
CodePudding user response:
input_array = ["10 - Extremely Interested", "9 - Very Interested"]
only_my_numbers = []
for element in input_array:
# element.split("-") will split based on "-"
# But it will will have trailing space. Remove that with strip()
print(element.split("-")[0].strip())
only_my_numbers.append(element.split("-")[0].strip())