I have a dataframe like this (called df
):
OU
CORP:Jenny Smith:
"CORP:John Smith:,John Smith:"
CORP:LINK:
CORP:Harry Linkster:
STORE:Mary Poppins:
STORE:Tony Stark:
STORE:Carmen Sandiego:
NEWS:Peter Parker:
NEWS:PARK:
NEWS:Clark Kent:
I want to parse it and check for any ONE word strings in the column, such as: LINK
and PARK
.
This is the logic I have:
for i in df.iteritems():
#if length of strings in between ':' == 1
#drop that row, and move to another dataframe df2
df should look like this after:
OU
CORP:Jenny Smith:
"CORP:John Smith:,John Smith:"
CORP:Harry Linkster:
STORE:Mary Poppins:
STORE:Tony Stark:
STORE:Carmen Sandiego:
NEWS:Peter Parker:
NEWS:Clark Kent:
df2 should look like this
OU
CORP:LINK:
NEWS:PARK:
CodePudding user response:
IIUC:
m = df['OU'].str.split(':').str[1].str.split().str.len() == 1
df2 = df[m]
df = df[~m]
Output:
>>> df
OU
0 CORP:Jenny Smith:
1 "CORP:John Smith:,John Smith:"
3 CORP:Harry Linkster:
4 STORE:Mary Poppins:
5 STORE:Tony Stark:
6 STORE:Carmen Sandiego:
7 NEWS:Peter Parker:
9 NEWS:Clark Kent:
>>> df2
OU
2 CORP:LINK:
8 NEWS:PARK:
CodePudding user response:
data
OU
0 CORP:Jenny Smith:
1 CORP:John Smith:,John Smith:
2 CORP:LINK:
3 CORP:Harry Linkster:
4 STORE:Mary Poppins:
5 STORE:Tony Stark:
6 STORE:Carmen Sandiego:
7 NEWS:Peter Parker:
8 NEWS:PARK:
9 NEWS:Clark Kent:
solution split the string by first substring and find length of resulting list. Use that to generate boolean indexing to conditionally come up with dfs.
m=df['OU'].str.split('^[\w] \:|\s').str.len()==2
df1=df[m]
df2=df[~m]
print(df1)
OU
2 CORP:LINK:
8 NEWS:PARK:
print(df2)
OU
0 CORP:Jenny Smith:
1 CORP:John Smith:,John Smith:
3 CORP:Harry Linkster:
4 STORE:Mary Poppins:
5 STORE:Tony Stark:
6 STORE:Carmen Sandiego:
7 NEWS:Peter Parker:
9 NEWS:Clark Kent: