Home > Software design >  How to scrape certain amount of characters from line python
How to scrape certain amount of characters from line python

Time:02-24

So I am quite new to python and coding in general and I am learning python so I can automate tasks. I want to scrape data into a csv and then later use this csv to upload these products to a website. However, from the website that I am scraping. It has very long titles and descriptions. But the website I am uploading to has a character limit.

So what is the best way to approach this with python?

  • Is there a way I can limit the characters that are scraped? so when scraping the data only the first 45 characters are copied?

  • Or can I scrape like normal but then format the csv file to keep the first x amount of characters and how could I do this?

  • Lastly, or can I limit the amount of characters being pasted when filling the data into the website I am uploading to?

Please kindly share if any of these are possible and what would be the fasted and easiest way to perform this?

looking forward to any help!

CodePudding user response:

I would suggest cutting the too long strings.
I mean after scraping the data, in case your string is too long - cut out the string tail.
For example if your scraped string is named data and you need to limit its length to 50 you can do the following:

data = data[:50]

CodePudding user response:

Python has standard module textwrap which can keep full words and add [...] at the end.

import textwrap

textwrap.shorten('Hello World of Python', 17)

gives

'Hello World [...]'

If you don't want [...] then you can use own placeholder but it needs to use

import textwrap

w = textwrap.TextWrapper(placeholder='', width=17, max_lines=1)

w.fill('Hello World of Python')

which gives

'Hello World of'

And using placeholder=', etc.'

import textwrap

w = textwrap.TextWrapper(placeholder=', etc.', width=17, max_lines=1)

w.fill('Hello World of Python')

you can get

'Hello World, etc.'

You can get full path to source code and check how it works

import textwrap

print( textwrap.__file__ )
  • Related