Convert HTML to text

I forgot from where I copied this script:

# Usage: convert-html-to-md […]
# Convert the specified HTML files into Markdown text-format equivalents
# in the current working directory. The file extension will be .md.txt.
# Requires the Python script by Aaron Swartz to convert
# from HTML to Markdown text [].
# html2text=”${1}”shift

[while [ -n “${1}” ] ; do
# Use the contents of the title element for the filename. In case
# the title element spans multiple lines, the entire file is first
# converted to a single line before the sed pattern is applied. Any
# “unsafe” characters are then replaced with hyphens to produce a
# valid filename.
title=$(cat “${1}” | \
tr -d ‘\n\r’ | \
sed -nre ‘s/^.*(.*?)<\/title>.*$/\1\n/ip’ | \<br /> tr “\`~\!@#$%^&*()+={}|[]\\:;\”\’<>?,/ \t” ‘[-*]’)</p> <p> # If there’s no title, then just use the original filename.<br /> if [ -z “${title}” ] ; then<br /> title=$(basename “${1}” .html)<br /> fi</p> <p> # Convert the HTML to Markdown.<br /> cat “${1}” | python “${html2text}” > “${title}.md.txt”<br /> shift<br /> done]</p> <div id='jp-relatedposts' class='jp-relatedposts' > <h3 class="jp-relatedposts-headline"><em>Related</em></h3> </div> </div><!-- .entry-content --> <footer class="entry-meta"> Posted on <a href="" title="12:16" rel="bookmark"><time class="entry-date" datetime="2008-05-08T12:16:18+00:00" pubdate>2008-05-08</time></a><span class="byline"> by <span class="author vcard"><a class="url fn n" href="" title="View all posts by John Laudun" rel="author">John Laudun</a></span></span>. This entry was posted in <a href="" rel="category tag">work</a> and tagged <a href="" rel="tag">code</a>, <a href="" rel="tag">html</a>, <a href="" rel="tag">python</a>. Bookmark the <a href="" title="Permalink to Convert HTML to text" rel="bookmark">permalink</a>. </footer><!-- .entry-meta --> </article><!-- #post-1997 --> <nav role="navigation" id="nav-below" class="site-navigation post-navigation"> <h1 class="assistive-text">Post navigation</h1> <div class="nav-previous"><a href="" rel="prev"><span class="meta-nav">←</span> Of Heraclitus and Wet Pants</a></div> <div class="nav-next"><a href="" rel="next">Audiobooks in iTunes <span class="meta-nav">→</span></a></div> </nav><!-- #nav-below --> </div><!-- #content .site-content --> </div><!-- #primary .content-area --> <div id="secondary" class="widget-area" role="complementary"> <aside id="search-2" class="widget widget_search"> <form method="get" id="searchform" action="" role="search"> <label for="s" class="assistive-text">Search</label> <input type="text" class="field" name="s" value="" id="s" placeholder="Search …" /> <input type="submit" class="submit" name="submit" id="searchsubmit" value="Search" /> </form> </aside> <aside id="recent-posts-2" class="widget widget_recent_entries"> <h1 class="widget-title">Recent Posts</h1> <ul> <li> <a href="">Does Not Compute ยท Collaborative Fund</a> </li> <li> <a href="">Quanta on Claude Shannon</a> </li> <li> <a href="">Convergences</a> </li> <li> <a href="">On Introverts</a> </li> <li> <a href="">Tangherlini in the News</a> </li> </ul> </aside><aside id="meta-2" class="widget widget_meta"><h1 class="widget-title">Meta</h1> <ul> <li><a href="">Log in</a></li> <li><a href="">Entries feed</a></li> <li><a href="">Comments feed</a></li> <li><a href=""></a></li> </ul> </aside><aside id="categories-2" class="widget widget_categories"><h1 class="widget-title">Categories</h1> <ul> <li class="cat-item cat-item-754"><a href="">500</a> </li> <li class="cat-item cat-item-751"><a href="">application</a> </li> <li class="cat-item cat-item-2"><a href="">basic</a> </li> <li class="cat-item cat-item-1"><a href="">chronicle</a> </li> <li class="cat-item cat-item-5"><a href="">home</a> </li> <li class="cat-item cat-item-753"><a href="">teaching</a> </li> <li class="cat-item cat-item-6"><a href="">work</a> </li> </ul> </aside><aside id="archives-2" class="widget widget_archive"><h1 class="widget-title">Archives</h1> <ul> <li><a href=''>January 2022</a></li> <li><a href=''>December 2021</a></li> <li><a href=''>November 2021</a></li> <li><a href=''>August 2021</a></li> <li><a href=''>July 2021</a></li> <li><a href=''>June 2021</a></li> <li><a href=''>May 2021</a></li> <li><a href=''>April 2021</a></li> <li><a href=''>March 2021</a></li> <li><a href=''>February 2021</a></li> <li><a href=''>January 2021</a></li> <li><a href=''>December 2020</a></li> <li><a href=''>November 2020</a></li> <li><a href=''>October 2020</a></li> <li><a href=''>August 2020</a></li> <li><a href=''>July 2020</a></li> <li><a href=''>June 2020</a></li> <li><a href=''>May 2020</a></li> <li><a href=''>April 2020</a></li> <li><a href=''>March 2020</a></li> <li><a href=''>February 2020</a></li> <li><a href=''>January 2020</a></li> <li><a href=''>December 2019</a></li> <li><a href=''>November 2019</a></li> <li><a href=''>October 2019</a></li> <li><a href=''>September 2019</a></li> <li><a href=''>August 2019</a></li> <li><a href=''>July 2019</a></li> <li><a href=''>June 2019</a></li> <li><a href=''>May 2019</a></li> <li><a href=''>April 2019</a></li> <li><a href=''>March 2019</a></li> <li><a href=''>February 2019</a></li> <li><a href=''>January 2019</a></li> <li><a href=''>December 2018</a></li> <li><a href=''>November 2018</a></li> <li><a href=''>October 2018</a></li> <li><a href=''>September 2018</a></li> <li><a href=''>August 2018</a></li> <li><a href=''>July 2018</a></li> <li><a href=''>June 2018</a></li> <li><a href=''>May 2018</a></li> <li><a href=''>March 2018</a></li> <li><a href=''>February 2018</a></li> <li><a href=''>January 2018</a></li> <li><a href=''>December 2017</a></li> <li><a href=''>November 2017</a></li> <li><a href=''>October 2017</a></li> <li><a href=''>September 2017</a></li> <li><a href=''>August 2017</a></li> <li><a href=''>June 2017</a></li> <li><a href=''>May 2017</a></li> <li><a href=''>April 2017</a></li> <li><a href=''>March 2017</a></li> <li><a href=''>February 2017</a></li> <li><a href=''>January 2017</a></li> <li><a href=''>December 2016</a></li> <li><a href=''>November 2016</a></li> <li><a href=''>October 2016</a></li> <li><a href=''>September 2016</a></li> <li><a href=''>August 2016</a></li> <li><a href=''>July 2016</a></li> <li><a href=''>June 2016</a></li> <li><a href=''>May 2016</a></li> <li><a href=''>April 2016</a></li> <li><a href=''>March 2016</a></li> <li><a href=''>February 2016</a></li> <li><a href=''>January 2016</a></li> <li><a href=''>December 2015</a></li> <li><a href=''>November 2015</a></li> <li><a href=''>October 2015</a></li> <li><a href=''>September 2015</a></li> <li><a href=''>August 2015</a></li> <li><a href=''>July 2015</a></li> <li><a href=''>June 2015</a></li> <li><a href=''>May 2015</a></li> <li><a href=''>April 2015</a></li> <li><a href=''>March 2015</a></li> <li><a href=''>February 2015</a></li> <li><a href=''>January 2015</a></li> <li><a href=''>December 2014</a></li> <li><a href=''>November 2014</a></li> <li><a href=''>October 2014</a></li> <li><a href=''>September 2014</a></li> <li><a href=''>August 2014</a></li> <li><a href=''>July 2014</a></li> <li><a href=''>June 2014</a></li> <li><a href=''>May 2014</a></li> <li><a href=''>April 2014</a></li> <li><a href=''>March 2014</a></li> <li><a href=''>February 2014</a></li> <li><a href=''>January 2014</a></li> <li><a href=''>December 2013</a></li> <li><a href=''>November 2013</a></li> <li><a href=''>October 2013</a></li> <li><a href=''>September 2013</a></li> <li><a href=''>August 2013</a></li> <li><a href=''>July 2013</a></li> <li><a href=''>June 2013</a></li> <li><a href=''>May 2013</a></li> <li><a href=''>April 2013</a></li> <li><a href=''>March 2013</a></li> <li><a href=''>February 2013</a></li> <li><a href=''>January 2013</a></li> <li><a href=''>December 2012</a></li> <li><a href=''>November 2012</a></li> <li><a href=''>October 2012</a></li> <li><a href=''>September 2012</a></li> <li><a href=''>August 2012</a></li> <li><a href=''>July 2012</a></li> <li><a href=''>June 2012</a></li> <li><a href=''>May 2012</a></li> <li><a href=''>April 2012</a></li> <li><a href=''>March 2012</a></li> <li><a href=''>February 2012</a></li> <li><a href=''>January 2012</a></li> <li><a href=''>December 2011</a></li> <li><a href=''>November 2011</a></li> <li><a href=''>October 2011</a></li> <li><a href=''>September 2011</a></li> <li><a href=''>August 2011</a></li> <li><a href=''>July 2011</a></li> <li><a href=''>June 2011</a></li> <li><a href=''>May 2011</a></li> <li><a href=''>April 2011</a></li> <li><a href=''>March 2011</a></li> <li><a href=''>February 2011</a></li> <li><a href=''>January 2011</a></li> <li><a href=''>December 2010</a></li> <li><a href=''>November 2010</a></li> <li><a href=''>October 2010</a></li> <li><a href=''>September 2010</a></li> <li><a href=''>August 2010</a></li> <li><a href=''>July 2010</a></li> <li><a href=''>June 2010</a></li> <li><a href=''>May 2010</a></li> <li><a href=''>April 2010</a></li> <li><a href=''>March 2010</a></li> <li><a href=''>February 2010</a></li> <li><a href=''>January 2010</a></li> <li><a href=''>December 2009</a></li> <li><a href=''>November 2009</a></li> <li><a href=''>October 2009</a></li> <li><a href=''>September 2009</a></li> <li><a href=''>August 2009</a></li> <li><a href=''>July 2009</a></li> <li><a href=''>June 2009</a></li> <li><a href=''>May 2009</a></li> <li><a href=''>April 2009</a></li> <li><a href=''>March 2009</a></li> <li><a href=''>February 2009</a></li> <li><a href=''>January 2009</a></li> <li><a href=''>December 2008</a></li> <li><a href=''>November 2008</a></li> <li><a href=''>October 2008</a></li> <li><a href=''>September 2008</a></li> <li><a href=''>August 2008</a></li> <li><a href=''>July 2008</a></li> <li><a href=''>June 2008</a></li> <li><a href=''>May 2008</a></li> <li><a href=''>April 2008</a></li> <li><a href=''>March 2008</a></li> <li><a href=''>February 2008</a></li> <li><a href=''>January 2008</a></li> <li><a href=''>December 2007</a></li> <li><a href=''>November 2007</a></li> <li><a href=''>October 2007</a></li> <li><a href=''>July 2007</a></li> <li><a href=''>June 2007</a></li> <li><a href=''>April 2007</a></li> <li><a href=''>March 2007</a></li> <li><a href=''>December 2006</a></li> <li><a href=''>September 2006</a></li> <li><a href=''>July 2006</a></li> <li><a href=''>June 2006</a></li> <li><a href=''>May 2006</a></li> <li><a href=''>April 2006</a></li> <li><a href=''>February 2006</a></li> <li><a href=''>January 2006</a></li> <li><a href=''>December 2005</a></li> <li><a href=''>November 2005</a></li> <li><a href=''>October 2005</a></li> <li><a href=''>September 2005</a></li> <li><a href=''>August 2005</a></li> <li><a href=''>May 2004</a></li> <li><a href=''>February 2004</a></li> <li><a href=''>December 2001</a></li> <li><a href=''>April 1992</a></li> <li><a href=''>September 1986</a></li> </ul> </aside> </div><!-- #secondary .widget-area --> </div><!-- #main .site-main --> <footer id="colophon" class="site-footer" role="contentinfo"> <div class="site-info"> <a href="" rel="generator">Proudly powered by WordPress</a> Theme: Publish by <a href="" rel="designer">Konstantin Kovshenin</a>. </div><!-- .site-info --> </footer><!-- #colophon .site-footer --> </div><!-- #page .hfeed .site --> <script type='text/javascript' src='' id='jetpack-photon-js'></script> <script type='text/javascript' src='' id='small-menu-js'></script> <script type='text/javascript' src='' id='wp-embed-js'></script> <script src='' defer></script> <script> _stq = window._stq || []; _stq.push([ 'view', {v:'ext',j:'1:10.1',blog:'194110954',post:'1997',tz:'0',srv:''} ]); _stq.push([ 'clickTrackerInit', '194110954', '1997' ]); </script> </body> </html>