I got a excel file that have a customer column with customers name but in the same column there is other data. I want to only get customers name but since they are all different I can't figure out how to capture it. My only solution I though was to get the rows that contains only one space between two string (Fist and Last name). This is my code so far :
import pandas as pd
import numpy as np
import re
#Dataframe for each data needed
customer = pd.read_excel('rpt_rs_ArrivalsPerMeeting.xls', usecols='B', skiprows=17)
activity = pd.read_excel('rpt_rs_ArrivalsPerMeeting.xls', usecols='M', skiprows=17)
resNumber = pd.read_excel('rpt_rs_ArrivalsPerMeeting.xls', usecols='Q', skiprows=17)
numberOfPeople = pd.read_excel('rpt_rs_ArrivalsPerMeeting.xls', usecols='R', skiprows=17)
timeOfMeeting = pd.read_excel('rpt_rs_ArrivalsPerMeeting.xls', usecols='B', skiprows=15)
#Remove empty data in the lists
customer.dropna(inplace=True)
activity.dropna(inplace=True)
resNumber.dropna(inplace=True)
numberOfPeople.dropna(inplace=True)
timeOfMeeting.dropna(inplace=True)
#Lists of data in determine Colums
customer_list = customer["Customer Name"].tolist()
activity_list = activity["Activity"].tolist()
resNumber_list = resNumber["Res.No."].tolist()
people_list = numberOfPeople["Total Places"].tolist()
timeOfMeeting_list = timeOfMeeting["Virtual Reality Tremblant"].tolist()
print(customer)
And this is the output :
Customer Name
0 Raphaelle Pruja
3 Arnauld Soni
6 Meeting Time: 11:00 AM
8 Ticket Customer Virtual Reality Tremblant/2021...
11 Meeting Time: 11:10 AM
.. ...
167 Ticket Customer Virtual Reality Tremblant/2021...
169 Ticket Customer Virtual Reality Tremblant/2021...
171 Ticket Customer Virtual Reality Tremblant/2021...
173 Ticket Customer Virtual Reality Tremblant/2021...
177 Total Places:
[73 rows x 1 columns]
So as you can see there is multiple data on the column...
Thanks for your answer!
CodePudding user response:
Just filter your data frame by the condition of count()
the whitespaces:
df[df['Customer Name'].str.count(' ') == 1]
Example
import pandas as pd
df = pd.DataFrame({'Customer Name':
['Raphaelle Pruja',
'Arnauld Soni',
'Meeting Time: 11:00 AM',
'Ticket Customer Virtual Reality Tremblant/2021...']
})
df[df['Customer Name'].str.count(' ') == 1]
Output
Customer Name |
---|
Raphaelle Pruja |
Arnauld Soni |
CodePudding user response:
I don't quite understand the way your date is formed but if you have a column of full names you can spilt it into two new columns of first and last name like this:
# new data frame with split value columns
new = data["Name"].str.split(" ", n = 1, expand = True)
# making separate first name column from new data frame
data["First Name"]= new[0]
# making separate last name column from new data frame
data["Last Name"]= new[1]
# Dropping old Name columns
data.drop(columns =["Name"], inplace = True)
https://www.geeksforgeeks.org/python-pandas-split-strings-into-two-list-columns-using-str-split/
CodePudding user response:
Using a toy DataFrame
as an example:
df = pd.DataFrame({'x': ['John Smith', 'bbla', 'A B']})
Then all you have to do is this:
df[df['x'].str.count(' ') == 1]
You can actually use an even simpler method:
df.query('x.str.count(" ") == [1]')