Skip to main content

How to Resolve Python BeautifulSoup Error "bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml" (or other parsers)

When using the BeautifulSoup library (bs4) for parsing HTML or XML documents in Python, you might encounter the error bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: parser_name. This typically happens when you explicitly request a specific parser (like lxml, html5lib, or xml) in the BeautifulSoup constructor, but the underlying library for that parser isn't installed in your Python environment.

This guide explains why this error occurs and how to resolve it by either installing the requested parser or choosing a different one.

Understanding the Error: BeautifulSoup Parsers

BeautifulSoup itself doesn't parse HTML/XML directly. Instead, it relies on an underlying parser library to interpret the markup structure and build a navigable tree. You tell BeautifulSoup which parser to use via the second argument in its constructor. Common choices include:

  • 'html.parser': Python's built-in HTML parser. Decent speed, moderately lenient.
  • 'lxml': A very fast and lenient HTML parser based on C libraries. Requires separate installation.
  • 'html5lib': An extremely lenient parser that aims to mimic how web browsers handle malformed HTML. Creates valid HTML5. Requires separate installation, generally slower than lxml.
  • 'xml': Uses lxml's XML parser. Requires lxml installation.

The FeatureNotFound error occurs when BeautifulSoup tries to find and use the parser library you requested (e.g., lxml) but cannot find it installed in the current Python environment.

Cause: Requested Parser Library Not Installed

The direct cause of the error Couldn't find a tree builder with the features you requested: lxml is that you specified "lxml" as the parser, but the lxml Python package is not installed.

error_example.py
from bs4 import BeautifulSoup

html_doc = "<html><head><title>Test</title></head><body><p>Hello</p></body></html>"

try:
# ⛔️ bs4.FeatureNotFound: Couldn't find a tree builder with the
# features you requested: lxml. Do you need to install a parser library?
# Attempting to use 'lxml' without having it installed.
soup = BeautifulSoup(html_doc, "lxml")
print(soup.title)
except ImportError: # Sometimes it might raise ImportError first depending on setup
print("ImportError caught: lxml library likely missing.")
except Exception as e: # Catch FeatureNotFound specifically if ImportError doesn't trigger
print(f"Caught Error: {e}")
# Check if it's the specific error we expect
if "lxml" in str(e) and "FeatureNotFound" in str(type(e).__name__):
print("--> Confirmed: lxml parser not found.")
note

Depending on the exact environment and bs4 version, sometimes an ImportError might occur before FeatureNotFound, but the root cause – missing library – is the same

The same error structure applies if you specify "html5lib" or "xml" without having html5lib or lxml installed, respectively.

Solution 1: Use the Built-in html.parser (No Installation Needed)

The simplest fix, especially if you don't have specific requirements for speed or handling extremely broken HTML, is to use Python's built-in parser. It doesn't require any extra installation beyond Python itself.

solution_html_parser.py
from bs4 import BeautifulSoup

html_doc = "<html><head><title>Test</title></head><body><p>Hello</p></body></html>"

# ✅ Use the built-in parser
soup = BeautifulSoup(html_doc, "html.parser")

print("Using html.parser:")
print(f"Title: {soup.title}") # Output: <title>Test</title>
print(f"Title text: {soup.title.string}") # Output: Test
print(f"Paragraph: {soup.p}") # Output: <p>Hello</p>
  • Pros: No external dependencies. Reasonably fast. Lenient enough for most well-formed HTML.
  • Cons: Not as fast as lxml. Not as lenient as html5lib for severely broken HTML.

lxml is generally the fastest available parser and is quite robust. If performance is important, it's usually the best choice for HTML.

  1. Install lxml: Open your terminal (ensure your virtual environment is activated if using one) and run:
    pip install lxml
    # Or:
    pip3 install lxml
    # Or:
    python -m pip install lxml
  2. Use lxml in BeautifulSoup:
    # solution_lxml.py
    from bs4 import BeautifulSoup

    html_doc = "<html><head><title>Test</title></head><body><p>Hello</p></body></html>"

    # ✅ Use the installed 'lxml' parser
    soup = BeautifulSoup(html_doc, "lxml")

    print("Using lxml parser:")
    print(f"Title: {soup.title}")
    print(f"Paragraph: {soup.p}")
  • Pros: Very fast. Lenient. Can also parse XML (see section 6).
  • Cons: Requires separate installation. It's a C extension, which might occasionally cause installation issues on some systems (though usually straightforward with pip).

Solution 3: Install and Use the html5lib Parser (Best for Browser-like Parsing)

html5lib aims to parse HTML exactly as a modern web browser would, meaning it handles malformed markup very gracefully and produces valid HTML5 output. Use this if you need to process "real-world" messy HTML reliably.

  1. Install html5lib:
    pip install html5lib
    # Or: pip3 install html5lib / python -m pip install html5lib etc.
  2. Use html5lib in BeautifulSoup:
    # solution_html5lib.py
    from bs4 import BeautifulSoup

    # Example with slightly broken HTML that html5lib handles well
    html_doc = "<html><head><title>Test<body><p>Hello<p>World"

    # ✅ Use the installed 'html5lib' parser
    soup = BeautifulSoup(html_doc, "html5lib")

    print("Using html5lib parser:")
    # html5lib might reconstruct the tree slightly differently
    print(soup.prettify()) # Show how it fixed the HTML
    print(f"Title: {soup.title}")
    print(f"Paragraphs: {soup.find_all('p')}")
  • Pros: Extremely lenient (like a browser). Creates valid HTML5 structure.
  • Cons: Significantly slower than lxml and html.parser. Requires separate installation.

Parsing XML (xml or lxml-xml)

If your input document is XML, not HTML, you need to use an XML parser. BeautifulSoup primarily uses lxml for this.

  1. Ensure lxml is Installed: (See Solution 2).
  2. Specify the XML parser:
    # example_xml.py
    from bs4 import BeautifulSoup

    xml_doc = '<root><item id="1">Value 1</item><item id="2">Value 2</item></root>'

    # ✅ Use 'xml' (which defaults to lxml's XML parser)
    soup_xml = BeautifulSoup(xml_doc, "xml")
    # Or explicitly: s
    oup_xml = BeautifulSoup(xml_doc, "lxml-xml")

    print("Parsing XML:")
    print(soup_xml.find('item', {'id': '2'})) # Output: <item id="2">Value 2</item>
  • The built-in html.parser and html5lib cannot parse XML correctly. You must have lxml installed and specify "xml" or "lxml-xml".

Choosing the Right Parser

Featurehtml.parserlxmlhtml5libxml (lxml)
TypeHTMLHTMLHTMLXML
InstallationBuilt-inpip install lxmlpip install html5libpip install lxml
SpeedModerateVery FastSlowVery Fast
Lenient?YesVery YesExtremely YesStrict (XML)
DependenciesNoneC LibrariesPythonC Libraries
Use CaseSimple HTML, no extra installsSpeed, most HTML, XMLBroken HTML, browser-likeXML documents
note

General Recommendation: Start with html.parser. If you need more speed or leniency for HTML, install and use lxml. If you need to perfectly mimic browser handling of broken HTML, use html5lib. For XML, use xml (requires lxml).

Conclusion

The bs4.FeatureNotFound: Couldn't find a tree builder... error means BeautifulSoup cannot locate the parser library you specified (like lxml or html5lib).

The solutions are:

  1. Switch to the built-in parser: Change the second argument to BeautifulSoup(markup, "html.parser"). No installation needed.
  2. Install the missing parser: Use pip install lxml or pip install html5lib and keep your original parser choice in the BeautifulSoup call.
  3. Ensure you use "xml" or "lxml-xml" (and have lxml installed) if parsing XML documents.

By either installing the required external parser or selecting the appropriate built-in one, you can resolve this error and successfully parse your documents with BeautifulSoup.