Home > Net >  Checking a Pandas dataframe, check for length of strings and move that data to another dataframe
Checking a Pandas dataframe, check for length of strings and move that data to another dataframe

Time:03-19

I have a dataframe like this (called df):

 OU                      
 CORP:Jenny Smith:   
 "CORP:John Smith:,John Smith:" 
 CORP:LINK:
 CORP:Harry Linkster:
 STORE:Mary Poppins:  
 STORE:Tony Stark:
 STORE:Carmen Sandiego:    
 NEWS:Peter Parker:
 NEWS:PARK:
 NEWS:Clark Kent:

I want to parse it and check for any ONE word strings in the column, such as: LINK and PARK.

This is the logic I have:

for i in df.iteritems():
    #if length of strings in between ':' == 1
    #drop that row, and move to another dataframe df2

df should look like this after:

 OU                      
 CORP:Jenny Smith:   
 "CORP:John Smith:,John Smith:" 
 CORP:Harry Linkster:
 STORE:Mary Poppins:  
 STORE:Tony Stark:
 STORE:Carmen Sandiego:    
 NEWS:Peter Parker:
 NEWS:Clark Kent:

df2 should look like this

 OU                       
 CORP:LINK:
 NEWS:PARK:

CodePudding user response:

IIUC:

m = df['OU'].str.split(':').str[1].str.split().str.len() == 1
df2 = df[m]
df = df[~m]

Output:

>>> df
                               OU
0               CORP:Jenny Smith:
1  "CORP:John Smith:,John Smith:"
3            CORP:Harry Linkster:
4             STORE:Mary Poppins:
5               STORE:Tony Stark:
6          STORE:Carmen Sandiego:
7              NEWS:Peter Parker:
9                NEWS:Clark Kent:

>>> df2
           OU
2  CORP:LINK:
8  NEWS:PARK:

CodePudding user response:

data

                   OU
0             CORP:Jenny Smith:
1  CORP:John Smith:,John Smith:
2                    CORP:LINK:
3          CORP:Harry Linkster:
4           STORE:Mary Poppins:
5             STORE:Tony Stark:
6        STORE:Carmen Sandiego:
7            NEWS:Peter Parker:
8                    NEWS:PARK:
9              NEWS:Clark Kent:

solution split the string by first substring and find length of resulting list. Use that to generate boolean indexing to conditionally come up with dfs.

 m=df['OU'].str.split('^[\w] \:|\s').str.len()==2

df1=df[m]

df2=df[~m]

print(df1)
  OU
2  CORP:LINK:
8  NEWS:PARK:




print(df2)
              OU
0             CORP:Jenny Smith:
1  CORP:John Smith:,John Smith:
3          CORP:Harry Linkster:
4           STORE:Mary Poppins:
5             STORE:Tony Stark:
6        STORE:Carmen Sandiego:
7            NEWS:Peter Parker:
9              NEWS:Clark Kent:
  • Related