Home > Enterprise >  Convert an HTML string to a .txt file in Python
Convert an HTML string to a .txt file in Python

Time:04-03

I have an HTML string which is guaranteed to only contain text (i.e. no images, videos, or other assets). However, just to note, there might be formatting with some of the text like some of them might be bold.

Is there a way to convert the HTML string output to a .txt file? I don't care about maintaining the formatting but I do want to maintain the spacing of the text.

Is that possible with Python?

CodePudding user response:

I had a similar problem earlier where I needed to write the code for Echarts (the Framework for generating front-end diagrams) to a file.Maybe you can refer to it

# Generate HTML file
html_file = open(file_path, "w")
html_content = """
<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8" />
    <title>ECharts</title>
    <script src="echarts.min.js"></script>
  </head>
  <body>
    <div id="main" style="width: 1200px;height:800px;"></div>
    <script type="text/javascript">
      var myChart = echarts.init(document.getElementById('main'));

      var option = {
            title: {
              text: 'Memory Monitor'
            },
            tooltip: {
              trigger: 'axis'
            },
            legend: {
              data: ['%(package_name)s']
            },
            toolbox: {
              feature: {
                saveAsImage: {}
              }
            },
            xAxis: {
              type: 'category',
              boundaryGap: false,
              data: %(x_axis)s
            },
            yAxis: {
              type: 'value',
              scale : true,
              max : 20000,
              min : 8000,
              splitNumber : 5,
              boundaryGap : [ 0.2, 0.2 ]
            },
            dataZoom:[{
              type: 'slider',
              show: true,
              realtime: true,
              start: 0,
              end: 100
            }],
            series: [
              {
                name: '%(package_name)s',
                type: 'line',
                stack: 'Total',
                data: %(y_axis)s
              }
            ]
          };

      myChart.setOption(option);
    </script>
  </body>
</html>
""" % dict(package_name=package_name, x_axis=x_axis, y_axis=y_axis)
# Written to the file
html_file.write(html_content)
# Close file
html_file.close()

CodePudding user response:

#!/usr/bin/env python

import urllib2
import html2text
from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://example.com/page.html').read())

txt = soup.find('div', {'class' : 'body'})

print(html2text.html2text(txt))
  • Related