Convert HTML to text

I forgot from where I copied this script:

# Usage: convert-html-to-md […]
# Convert the specified HTML files into Markdown text-format equivalents
# in the current working directory. The file extension will be .md.txt.
# Requires the Python script by Aaron Swartz to convert
# from HTML to Markdown text [].
# html2text=”${1}”shift

[while [ -n “${1}” ] ; do
# Use the contents of the title element for the filename. In case
# the title element spans multiple lines, the entire file is first
# converted to a single line before the sed pattern is applied. Any
# “unsafe” characters are then replaced with hyphens to produce a
# valid filename.
title=$(cat “${1}” | \
tr -d ‘\n\r’ | \
title=$(basename "${1}" .html)
fi

# Convert the HTML to Markdown.
cat "${1}" | python "${html2text}" > "${title}.md.txt"
shift
done]

Related

Posted on 2008-05-08 by John Laudun. This entry was posted in work and tagged code, html, python. 