Home > database >  Python Pandas check if string contains only one space
Python Pandas check if string contains only one space

Time:12-06

I got a excel file that have a customer column with customers name but in the same column there is other data. I want to only get customers name but since they are all different I can't figure out how to capture it. My only solution I though was to get the rows that contains only one space between two string (Fist and Last name). This is my code so far :

import pandas as pd
import numpy as np
import re

#Dataframe for each data needed
customer = pd.read_excel('rpt_rs_ArrivalsPerMeeting.xls', usecols='B', skiprows=17)
activity = pd.read_excel('rpt_rs_ArrivalsPerMeeting.xls', usecols='M', skiprows=17)
resNumber = pd.read_excel('rpt_rs_ArrivalsPerMeeting.xls', usecols='Q', skiprows=17)
numberOfPeople = pd.read_excel('rpt_rs_ArrivalsPerMeeting.xls', usecols='R', skiprows=17)
timeOfMeeting = pd.read_excel('rpt_rs_ArrivalsPerMeeting.xls', usecols='B', skiprows=15)

#Remove empty data in the lists
customer.dropna(inplace=True)
activity.dropna(inplace=True)
resNumber.dropna(inplace=True)
numberOfPeople.dropna(inplace=True)
timeOfMeeting.dropna(inplace=True)

#Lists of data in determine Colums
customer_list = customer["Customer Name"].tolist()
activity_list = activity["Activity"].tolist()
resNumber_list = resNumber["Res.No."].tolist()
people_list = numberOfPeople["Total Places"].tolist()
timeOfMeeting_list = timeOfMeeting["Virtual Reality Tremblant"].tolist()


print(customer)

And this is the output :

                                         Customer Name
0                                      Raphaelle Pruja
3                                         Arnauld Soni
6                               Meeting Time: 11:00 AM
8    Ticket Customer Virtual Reality Tremblant/2021...
11                              Meeting Time: 11:10 AM
..                                                 ...
167  Ticket Customer Virtual Reality Tremblant/2021...
169  Ticket Customer Virtual Reality Tremblant/2021...
171  Ticket Customer Virtual Reality Tremblant/2021...
173  Ticket Customer Virtual Reality Tremblant/2021...
177                                      Total Places:

[73 rows x 1 columns]

So as you can see there is multiple data on the column...

Thanks for your answer!

CodePudding user response:

Just filter your data frame by the condition of count() the whitespaces:

df[df['Customer Name'].str.count(' ') == 1]

Example

import pandas as pd
df = pd.DataFrame({'Customer Name':
                    ['Raphaelle Pruja',
                     'Arnauld Soni',
                     'Meeting Time: 11:00 AM',
                     'Ticket Customer Virtual Reality Tremblant/2021...']
                 })

df[df['Customer Name'].str.count(' ') == 1]

Output

Customer Name
Raphaelle Pruja
Arnauld Soni

CodePudding user response:

I don't quite understand the way your date is formed but if you have a column of full names you can spilt it into two new columns of first and last name like this:

# new data frame with split value columns
new = data["Name"].str.split(" ", n = 1, expand = True)
  
# making separate first name column from new data frame
data["First Name"]= new[0]
  
# making separate last name column from new data frame
data["Last Name"]= new[1]
  
# Dropping old Name columns
data.drop(columns =["Name"], inplace = True)
  

https://www.geeksforgeeks.org/python-pandas-split-strings-into-two-list-columns-using-str-split/

CodePudding user response:

Using a toy DataFrame as an example:

df = pd.DataFrame({'x': ['John Smith', 'bbla', 'A B']})

Then all you have to do is this:

df[df['x'].str.count(' ') == 1]

You can actually use an even simpler method:

df.query('x.str.count(" ") == [1]')
  • Related