boilerpipeR - Interface to the Boilerpipe Java Library
Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe <https://github.com/kohlschutter/boilerpipe> Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.
Last updated 4 years ago
5.52 score 22 stars 30 scripts 319 downloads