Home > other >  Regex to split this string after '.' if there is a capital letter [A-Z] after it
Regex to split this string after '.' if there is a capital letter [A-Z] after it

Time:11-02

the string is :

"Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower. Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower. Bitcoin prices today were down 0.9% at $61,693. It is up 112% this year so far after hitting a record high of near $67,000 in October.Ether prices climbed to record high during the weekend. The AUM included all-time highs for individual asset products such as $55.2 billion for bitcoin products (52.2% increase) and $15.9 billion for ethereum products (30.0% increase)."

the out put will look like this :

Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by 
market capitalization trading marginally lower.
Bitcoin prices today were down 0.9% at $61,693.
It is up 112% this year so far after hitting a record high of near $67,000 in October.
Ether prices climbed to record high during the weekend.
The AUM included all-time highs for individual asset products such as $55.2 billion for bitcoin products (52.2% increase) and $15.9 billion for ethereum products (30.0% increase).

CodePudding user response:

We can try a regex split here:

inp = "Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower. Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower. Bitcoin prices today were down 0.9% at $61,693. It is up 112% this year so far after hitting a record high of near $67,000 in October. Ether prices climbed to record high during the weekend. The AUM included all-time highs for individual asset products such as $55.2 billion for bitcoin products (52.2% increase) and $15.9 billion for ethereum products (30.0% increase)."
lines = re.split(r'(?<=\.)\s (?=[A-Z])', inp)
print(lines)

This prints:

["Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower.",
 "Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower.",
 'Bitcoin prices today were down 0.9% at $61,693.',
 'It is up 112% this year so far after hitting a record high of near $67,000 in October.',
 'Ether prices climbed to record high during the weekend.',
 'The AUM included all-time highs for individual asset products such as $55.2 billion for bitcoin products (52.2% increase) and $15.9 billion for ethereum products (30.0% increase).']

Here is the regex logic:

(?<=\.)    assert that dot precedes (but do not consume)
\s         match one or more whitespace characters
(?=[A-Z])  assert that a capital letter follows (but do not consume)

CodePudding user response:

This is an easy method:-

string = "Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower. Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower. Bitcoin prices today were down 0.9% at $61,693. It is up 112% this year so far after hitting a record high of near $67,000 in October.Ether prices climbed to record high during the weekend. The AUM included all-time highs for individual asset products such as $55.2 billion for bitcoin products (52.2% increase) and $15.9 billion for ethereum products (30.0% increase)."
a = string.split(". ")
for i in a:
    print(i ("." if i!=a[-1] else ""))

a = string.split(". ") Splits text after "." followed by a gap(" "). Which is full stop as after in floats(Real numbers) there is no gap after decimal. For eg:- "0.9"

The for loop prints every item added with "." except the last item because "." is removed by split in every item except the last.

Output:

"Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower.
Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower.
Bitcoin prices today were down 0.9% at $61,693.
It is up 112% this year so far after hitting a record high of near $67,000 in October.Ether prices climbed to record high during the weekend.
The AUM included all-time highs for individual asset products such as $55.2 billion for bitcoin products (52.2% increase) and $15.9 billion for ethereum products (30.0% increase)."
  • Related