Convert HTML to text

I forgot from where I copied this script:

#!/bin/bash
# Usage: convert-html-to-md […]
# Convert the specified HTML files into Markdown text-format equivalents
# in the current working directory. The file extension will be .md.txt.
# Requires the html2text.py Python script by Aaron Swartz to convert
# from HTML to Markdown text [www.aaronsw.com/2002/html2text/].
# html2text=”${1}”shift

[while [ -n “${1}” ] ; do
# Use the contents of the title element for the filename. In case
# the title element spans multiple lines, the entire file is first
# converted to a single line before the sed pattern is applied. Any
# “unsafe” characters are then replaced with hyphens to produce a
# valid filename.
title=$(cat “${1}” | \
tr -d ‘\n\r’ | \
sed -nre ‘s/^.*(.*?)<\/title>.*$/\1\n/ip’ | \<br /> tr “\`~\!@#$%^&*()+={}|[]\\:;\”\’<>?,/ \t” ‘[-*]’)</p> <p> # If there’s no title, then just use the original filename.<br /> if [ -z “${title}” ] ; then<br /> title=$(basename “${1}” .html)<br /> fi</p> <p> # Convert the HTML to Markdown.<br /> cat “${1}” | python “${html2text}” > “${title}.md.txt”<br /> shift<br /> done]</p> <div id='jp-relatedposts' class='jp-relatedposts' > <h3 class="jp-relatedposts-headline"><em>Related</em></h3> </div> </div><!-- .entry-content --> <footer class="entry-meta"> Posted on <a href="https://johnlaudun.net/convert-html-to-text/" title="12:16" rel="bookmark"><time class="entry-date" datetime="2008-05-08T12:16:18+00:00" pubdate>2008-05-08</time></a><span class="byline"> by <span class="author vcard"><a class="url fn n" href="https://johnlaudun.net/author/johnlaudun/" title="View all posts by John Laudun" rel="author">John Laudun</a></span></span>. This entry was posted in <a href="https://johnlaudun.net/category/work/" rel="category tag">work</a> and tagged <a href="https://johnlaudun.net/tag/code/" rel="tag">code</a>, <a href="https://johnlaudun.net/tag/html/" rel="tag">html</a>, <a href="https://johnlaudun.net/tag/python/" rel="tag">python</a>. Bookmark the <a href="https://johnlaudun.net/convert-html-to-text/" title="Permalink to Convert HTML to text" rel="bookmark">permalink</a>. </footer><!-- .entry-meta --> </article><!-- #post-1997 --> <nav role="navigation" id="nav-below" class="site-navigation post-navigation"> <h1 class="assistive-text">Post navigation</h1> <div class="nav-previous"><a href="https://johnlaudun.net/of-heraclitus-and-wet-pants/" rel="prev"><span class="meta-nav">←</span> Of Heraclitus and Wet Pants</a></div> <div class="nav-next"><a href="https://johnlaudun.net/audiobooks-in-itunes/" rel="next">Audiobooks in iTunes <span class="meta-nav">→</span></a></div> </nav><!-- #nav-below --> </div><!-- #content .site-content --> </div><!-- #primary .content-area --> <div id="secondary" class="widget-area" role="complementary"> <aside id="search-2" class="widget widget_search"> <form method="get" id="searchform" action="https://johnlaudun.net/" role="search"> <label for="s" class="assistive-text">Search</label> <input type="text" class="field" name="s" value="" id="s" placeholder="Search …" /> <input type="submit" class="submit" name="submit" id="searchsubmit" value="Search" /> </form> </aside> <aside id="recent-posts-2" class="widget widget_recent_entries"> <h1 class="widget-title">Recent Posts</h1> <ul> <li> <a href="https://johnlaudun.net/open-source-alternatives/">Open Source Alternatives</a> </li> <li> <a href="https://johnlaudun.net/clarksons-farm/">Clarkson’s Farm</a> </li> <li> <a href="https://johnlaudun.net/why-i-user-a-reference-manager/">Why I Use a Reference Manager</a> </li> <li> <a href="https://johnlaudun.net/monte-carlo-simulations/">Monte Carlo Simulations</a> </li> <li> <a href="https://johnlaudun.net/automating-text-cleaning/">Automating Text Cleaning</a> </li> </ul> </aside><aside id="meta-2" class="widget widget_meta"><h1 class="widget-title">Meta</h1> <ul> <li><a href="https://johnlaudun.net/wp-login.php">Log in</a></li> <li><a href="https://johnlaudun.net/feed/">Entries feed</a></li> <li><a href="https://johnlaudun.net/comments/feed/">Comments feed</a></li> <li><a href="https://wordpress.org/">WordPress.org</a></li> </ul> </aside><aside id="categories-2" class="widget widget_categories"><h1 class="widget-title">Categories</h1> <ul> <li class="cat-item cat-item-754"><a href="https://johnlaudun.net/category/500/">500</a> </li> <li class="cat-item cat-item-751"><a href="https://johnlaudun.net/category/application/">application</a> </li> <li class="cat-item cat-item-2"><a href="https://johnlaudun.net/category/basic/">basic</a> </li> <li class="cat-item cat-item-1"><a href="https://johnlaudun.net/category/uncategorized/">chronicle</a> </li> <li class="cat-item cat-item-5"><a href="https://johnlaudun.net/category/home/">home</a> </li> <li class="cat-item cat-item-753"><a href="https://johnlaudun.net/category/teaching/">teaching</a> </li> <li class="cat-item cat-item-6"><a href="https://johnlaudun.net/category/work/">work</a> </li> </ul> </aside><aside id="archives-2" class="widget widget_archive"><h1 class="widget-title">Archives</h1> <ul> <li><a href='https://johnlaudun.net/2021/08/'>August 2021</a></li> <li><a href='https://johnlaudun.net/2021/07/'>July 2021</a></li> <li><a href='https://johnlaudun.net/2021/06/'>June 2021</a></li> <li><a href='https://johnlaudun.net/2021/05/'>May 2021</a></li> <li><a href='https://johnlaudun.net/2021/04/'>April 2021</a></li> <li><a href='https://johnlaudun.net/2021/03/'>March 2021</a></li> <li><a href='https://johnlaudun.net/2021/02/'>February 2021</a></li> <li><a href='https://johnlaudun.net/2021/01/'>January 2021</a></li> <li><a href='https://johnlaudun.net/2020/12/'>December 2020</a></li> <li><a href='https://johnlaudun.net/2020/11/'>November 2020</a></li> <li><a href='https://johnlaudun.net/2020/10/'>October 2020</a></li> <li><a href='https://johnlaudun.net/2020/08/'>August 2020</a></li> <li><a href='https://johnlaudun.net/2020/07/'>July 2020</a></li> <li><a href='https://johnlaudun.net/2020/06/'>June 2020</a></li> <li><a href='https://johnlaudun.net/2020/05/'>May 2020</a></li> <li><a href='https://johnlaudun.net/2020/04/'>April 2020</a></li> <li><a href='https://johnlaudun.net/2020/03/'>March 2020</a></li> <li><a href='https://johnlaudun.net/2020/02/'>February 2020</a></li> <li><a href='https://johnlaudun.net/2020/01/'>January 2020</a></li> <li><a href='https://johnlaudun.net/2019/12/'>December 2019</a></li> <li><a href='https://johnlaudun.net/2019/11/'>November 2019</a></li> <li><a href='https://johnlaudun.net/2019/10/'>October 2019</a></li> <li><a href='https://johnlaudun.net/2019/09/'>September 2019</a></li> <li><a href='https://johnlaudun.net/2019/08/'>August 2019</a></li> <li><a href='https://johnlaudun.net/2019/07/'>July 2019</a></li> <li><a href='https://johnlaudun.net/2019/06/'>June 2019</a></li> <li><a href='https://johnlaudun.net/2019/05/'>May 2019</a></li> <li><a href='https://johnlaudun.net/2019/04/'>April 2019</a></li> <li><a href='https://johnlaudun.net/2019/03/'>March 2019</a></li> <li><a href='https://johnlaudun.net/2019/02/'>February 2019</a></li> <li><a href='https://johnlaudun.net/2019/01/'>January 2019</a></li> <li><a href='https://johnlaudun.net/2018/12/'>December 2018</a></li> <li><a href='https://johnlaudun.net/2018/11/'>November 2018</a></li> <li><a href='https://johnlaudun.net/2018/10/'>October 2018</a></li> <li><a href='https://johnlaudun.net/2018/09/'>September 2018</a></li> <li><a href='https://johnlaudun.net/2018/08/'>August 2018</a></li> <li><a href='https://johnlaudun.net/2018/07/'>July 2018</a></li> <li><a href='https://johnlaudun.net/2018/06/'>June 2018</a></li> <li><a href='https://johnlaudun.net/2018/05/'>May 2018</a></li> <li><a href='https://johnlaudun.net/2018/03/'>March 2018</a></li> <li><a href='https://johnlaudun.net/2018/02/'>February 2018</a></li> <li><a href='https://johnlaudun.net/2018/01/'>January 2018</a></li> <li><a href='https://johnlaudun.net/2017/12/'>December 2017</a></li> <li><a href='https://johnlaudun.net/2017/11/'>November 2017</a></li> <li><a href='https://johnlaudun.net/2017/10/'>October 2017</a></li> <li><a href='https://johnlaudun.net/2017/09/'>September 2017</a></li> <li><a href='https://johnlaudun.net/2017/08/'>August 2017</a></li> <li><a href='https://johnlaudun.net/2017/06/'>June 2017</a></li> <li><a href='https://johnlaudun.net/2017/05/'>May 2017</a></li> <li><a href='https://johnlaudun.net/2017/04/'>April 2017</a></li> <li><a href='https://johnlaudun.net/2017/03/'>March 2017</a></li> <li><a href='https://johnlaudun.net/2017/02/'>February 2017</a></li> <li><a href='https://johnlaudun.net/2017/01/'>January 2017</a></li> <li><a href='https://johnlaudun.net/2016/12/'>December 2016</a></li> <li><a href='https://johnlaudun.net/2016/11/'>November 2016</a></li> <li><a href='https://johnlaudun.net/2016/10/'>October 2016</a></li> <li><a href='https://johnlaudun.net/2016/09/'>September 2016</a></li> <li><a href='https://johnlaudun.net/2016/08/'>August 2016</a></li> <li><a href='https://johnlaudun.net/2016/07/'>July 2016</a></li> <li><a href='https://johnlaudun.net/2016/06/'>June 2016</a></li> <li><a href='https://johnlaudun.net/2016/05/'>May 2016</a></li> <li><a href='https://johnlaudun.net/2016/04/'>April 2016</a></li> <li><a href='https://johnlaudun.net/2016/03/'>March 2016</a></li> <li><a href='https://johnlaudun.net/2016/02/'>February 2016</a></li> <li><a href='https://johnlaudun.net/2016/01/'>January 2016</a></li> <li><a href='https://johnlaudun.net/2015/12/'>December 2015</a></li> <li><a href='https://johnlaudun.net/2015/11/'>November 2015</a></li> <li><a href='https://johnlaudun.net/2015/10/'>October 2015</a></li> <li><a href='https://johnlaudun.net/2015/09/'>September 2015</a></li> <li><a href='https://johnlaudun.net/2015/08/'>August 2015</a></li> <li><a href='https://johnlaudun.net/2015/07/'>July 2015</a></li> <li><a href='https://johnlaudun.net/2015/06/'>June 2015</a></li> <li><a href='https://johnlaudun.net/2015/05/'>May 2015</a></li> <li><a href='https://johnlaudun.net/2015/04/'>April 2015</a></li> <li><a href='https://johnlaudun.net/2015/03/'>March 2015</a></li> <li><a href='https://johnlaudun.net/2015/02/'>February 2015</a></li> <li><a href='https://johnlaudun.net/2015/01/'>January 2015</a></li> <li><a href='https://johnlaudun.net/2014/12/'>December 2014</a></li> <li><a href='https://johnlaudun.net/2014/11/'>November 2014</a></li> <li><a href='https://johnlaudun.net/2014/10/'>October 2014</a></li> <li><a href='https://johnlaudun.net/2014/09/'>September 2014</a></li> <li><a href='https://johnlaudun.net/2014/08/'>August 2014</a></li> <li><a href='https://johnlaudun.net/2014/07/'>July 2014</a></li> <li><a href='https://johnlaudun.net/2014/06/'>June 2014</a></li> <li><a href='https://johnlaudun.net/2014/05/'>May 2014</a></li> <li><a href='https://johnlaudun.net/2014/04/'>April 2014</a></li> <li><a href='https://johnlaudun.net/2014/03/'>March 2014</a></li> <li><a href='https://johnlaudun.net/2014/02/'>February 2014</a></li> <li><a href='https://johnlaudun.net/2014/01/'>January 2014</a></li> <li><a href='https://johnlaudun.net/2013/12/'>December 2013</a></li> <li><a href='https://johnlaudun.net/2013/11/'>November 2013</a></li> <li><a href='https://johnlaudun.net/2013/10/'>October 2013</a></li> <li><a href='https://johnlaudun.net/2013/09/'>September 2013</a></li> <li><a href='https://johnlaudun.net/2013/08/'>August 2013</a></li> <li><a href='https://johnlaudun.net/2013/07/'>July 2013</a></li> <li><a href='https://johnlaudun.net/2013/06/'>June 2013</a></li> <li><a href='https://johnlaudun.net/2013/05/'>May 2013</a></li> <li><a href='https://johnlaudun.net/2013/04/'>April 2013</a></li> <li><a href='https://johnlaudun.net/2013/03/'>March 2013</a></li> <li><a href='https://johnlaudun.net/2013/02/'>February 2013</a></li> <li><a href='https://johnlaudun.net/2013/01/'>January 2013</a></li> <li><a href='https://johnlaudun.net/2012/12/'>December 2012</a></li> <li><a href='https://johnlaudun.net/2012/11/'>November 2012</a></li> <li><a href='https://johnlaudun.net/2012/10/'>October 2012</a></li> <li><a href='https://johnlaudun.net/2012/09/'>September 2012</a></li> <li><a href='https://johnlaudun.net/2012/08/'>August 2012</a></li> <li><a href='https://johnlaudun.net/2012/07/'>July 2012</a></li> <li><a href='https://johnlaudun.net/2012/06/'>June 2012</a></li> <li><a href='https://johnlaudun.net/2012/05/'>May 2012</a></li> <li><a href='https://johnlaudun.net/2012/04/'>April 2012</a></li> <li><a href='https://johnlaudun.net/2012/03/'>March 2012</a></li> <li><a href='https://johnlaudun.net/2012/02/'>February 2012</a></li> <li><a href='https://johnlaudun.net/2012/01/'>January 2012</a></li> <li><a href='https://johnlaudun.net/2011/12/'>December 2011</a></li> <li><a href='https://johnlaudun.net/2011/11/'>November 2011</a></li> <li><a href='https://johnlaudun.net/2011/10/'>October 2011</a></li> <li><a href='https://johnlaudun.net/2011/09/'>September 2011</a></li> <li><a href='https://johnlaudun.net/2011/08/'>August 2011</a></li> <li><a href='https://johnlaudun.net/2011/07/'>July 2011</a></li> <li><a href='https://johnlaudun.net/2011/06/'>June 2011</a></li> <li><a href='https://johnlaudun.net/2011/05/'>May 2011</a></li> <li><a href='https://johnlaudun.net/2011/04/'>April 2011</a></li> <li><a href='https://johnlaudun.net/2011/03/'>March 2011</a></li> <li><a href='https://johnlaudun.net/2011/02/'>February 2011</a></li> <li><a href='https://johnlaudun.net/2011/01/'>January 2011</a></li> <li><a href='https://johnlaudun.net/2010/12/'>December 2010</a></li> <li><a href='https://johnlaudun.net/2010/11/'>November 2010</a></li> <li><a href='https://johnlaudun.net/2010/10/'>October 2010</a></li> <li><a href='https://johnlaudun.net/2010/09/'>September 2010</a></li> <li><a href='https://johnlaudun.net/2010/08/'>August 2010</a></li> <li><a href='https://johnlaudun.net/2010/07/'>July 2010</a></li> <li><a href='https://johnlaudun.net/2010/06/'>June 2010</a></li> <li><a href='https://johnlaudun.net/2010/05/'>May 2010</a></li> <li><a href='https://johnlaudun.net/2010/04/'>April 2010</a></li> <li><a href='https://johnlaudun.net/2010/03/'>March 2010</a></li> <li><a href='https://johnlaudun.net/2010/02/'>February 2010</a></li> <li><a href='https://johnlaudun.net/2010/01/'>January 2010</a></li> <li><a href='https://johnlaudun.net/2009/12/'>December 2009</a></li> <li><a href='https://johnlaudun.net/2009/11/'>November 2009</a></li> <li><a href='https://johnlaudun.net/2009/10/'>October 2009</a></li> <li><a href='https://johnlaudun.net/2009/09/'>September 2009</a></li> <li><a href='https://johnlaudun.net/2009/08/'>August 2009</a></li> <li><a href='https://johnlaudun.net/2009/07/'>July 2009</a></li> <li><a href='https://johnlaudun.net/2009/06/'>June 2009</a></li> <li><a href='https://johnlaudun.net/2009/05/'>May 2009</a></li> <li><a href='https://johnlaudun.net/2009/04/'>April 2009</a></li> <li><a href='https://johnlaudun.net/2009/03/'>March 2009</a></li> <li><a href='https://johnlaudun.net/2009/02/'>February 2009</a></li> <li><a href='https://johnlaudun.net/2009/01/'>January 2009</a></li> <li><a href='https://johnlaudun.net/2008/12/'>December 2008</a></li> <li><a href='https://johnlaudun.net/2008/11/'>November 2008</a></li> <li><a href='https://johnlaudun.net/2008/10/'>October 2008</a></li> <li><a href='https://johnlaudun.net/2008/09/'>September 2008</a></li> <li><a href='https://johnlaudun.net/2008/08/'>August 2008</a></li> <li><a href='https://johnlaudun.net/2008/07/'>July 2008</a></li> <li><a href='https://johnlaudun.net/2008/06/'>June 2008</a></li> <li><a href='https://johnlaudun.net/2008/05/'>May 2008</a></li> <li><a href='https://johnlaudun.net/2008/04/'>April 2008</a></li> <li><a href='https://johnlaudun.net/2008/03/'>March 2008</a></li> <li><a href='https://johnlaudun.net/2008/02/'>February 2008</a></li> <li><a href='https://johnlaudun.net/2008/01/'>January 2008</a></li> <li><a href='https://johnlaudun.net/2007/12/'>December 2007</a></li> <li><a href='https://johnlaudun.net/2007/11/'>November 2007</a></li> <li><a href='https://johnlaudun.net/2007/10/'>October 2007</a></li> <li><a href='https://johnlaudun.net/2007/07/'>July 2007</a></li> <li><a href='https://johnlaudun.net/2007/06/'>June 2007</a></li> <li><a href='https://johnlaudun.net/2007/04/'>April 2007</a></li> <li><a href='https://johnlaudun.net/2007/03/'>March 2007</a></li> <li><a href='https://johnlaudun.net/2006/12/'>December 2006</a></li> <li><a href='https://johnlaudun.net/2006/09/'>September 2006</a></li> <li><a href='https://johnlaudun.net/2006/07/'>July 2006</a></li> <li><a href='https://johnlaudun.net/2006/06/'>June 2006</a></li> <li><a href='https://johnlaudun.net/2006/05/'>May 2006</a></li> <li><a href='https://johnlaudun.net/2006/04/'>April 2006</a></li> <li><a href='https://johnlaudun.net/2006/02/'>February 2006</a></li> <li><a href='https://johnlaudun.net/2006/01/'>January 2006</a></li> <li><a href='https://johnlaudun.net/2005/12/'>December 2005</a></li> <li><a href='https://johnlaudun.net/2005/11/'>November 2005</a></li> <li><a href='https://johnlaudun.net/2005/10/'>October 2005</a></li> <li><a href='https://johnlaudun.net/2005/09/'>September 2005</a></li> <li><a href='https://johnlaudun.net/2005/08/'>August 2005</a></li> <li><a href='https://johnlaudun.net/2004/05/'>May 2004</a></li> <li><a href='https://johnlaudun.net/2004/02/'>February 2004</a></li> <li><a href='https://johnlaudun.net/2001/12/'>December 2001</a></li> <li><a href='https://johnlaudun.net/1992/04/'>April 1992</a></li> <li><a href='https://johnlaudun.net/1986/09/'>September 1986</a></li> </ul> </aside> </div><!-- #secondary .widget-area --> </div><!-- #main .site-main --> <footer id="colophon" class="site-footer" role="contentinfo"> <div class="site-info"> <a href="http://wordpress.org/" rel="generator">Proudly powered by WordPress</a> Theme: Publish by <a href="http://kovshenin.com/" rel="designer">Konstantin Kovshenin</a>. </div><!-- .site-info --> </footer><!-- #colophon .site-footer --> </div><!-- #page .hfeed .site --> <script type='text/javascript' src='https://c0.wp.com/p/jetpack/10.0/_inc/build/photon/photon.min.js' id='jetpack-photon-js'></script> <script type='text/javascript' src='https://johnlaudun.net/wp-content/themes/publish/js/small-menu.js?ver=20120206' id='small-menu-js'></script> <script type='text/javascript' src='https://c0.wp.com/c/5.8.1/wp-includes/js/wp-embed.min.js' id='wp-embed-js'></script> <script type='text/javascript' id='jetpack_related-posts-js-extra'> /* <![CDATA[ */ var related_posts_js_options = {"post_heading":"h4"}; /* ]]> */ </script> <script type='text/javascript' src='https://c0.wp.com/p/jetpack/10.0/_inc/build/related-posts/related-posts.min.js' id='jetpack_related-posts-js'></script> <script src='https://stats.wp.com/e-202138.js' defer></script> <script> _stq = window._stq || []; _stq.push([ 'view', {v:'ext',j:'1:10.0',blog:'194110954',post:'1997',tz:'0',srv:'johnlaudun.net'} ]); _stq.push([ 'clickTrackerInit', '194110954', '1997' ]); </script> </body> </html>