I got an CSV file that looks like this.
0 | 1 |
---|---|
33 abcdefg abcdr 5 kjksk 6 jnjs 555 | row |
34 abcdefg abcdr 8 kjksk hf 8 jnsj 665 | row |
35 abcdefg abcdr 7 kjksk hffd 9 jsdi 667 | row |
I want to extract only the third number.
0 | 1 |
---|---|
5 | row |
8 | row |
7 | row |
I use something like:
df[0].str.extract('(\d )')
But I need some help to only extract the third number.
Hope someone can help!
CodePudding user response:
Use str.findall
:
df[0] = df[0].str.findall('\d ').str[1]
print(df)
# Output:
0 1
0 5 row
1 8 row
2 7 row
CodePudding user response:
With your shown samples, please try following regex.
^(?:\S \s ){3}(\S )\s
Explanation:
^(?:\S \s ){3} ##Matching 1 or non-spaces followed by 1 or more spaces in a non-capturing group 3 times.
(\S ) ##Creating capturing group to get all non-spaces values which are required by OP.
\s ##Matching 1 or more spaces here.