How to Resolve Python BeautifulSoup Error "bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml" (or other parsers)
When using the BeautifulSoup library (bs4
) for parsing HTML or XML documents in Python, you might encounter the error bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: parser_name
. This typically happens when you explicitly request a specific parser (like lxml
, html5lib
, or xml
) in the BeautifulSoup
constructor, but the underlying library for that parser isn't installed in your Python environment.
This guide explains why this error occurs and how to resolve it by either installing the requested parser or choosing a different one.
Understanding the Error: BeautifulSoup Parsers
BeautifulSoup itself doesn't parse HTML/XML directly. Instead, it relies on an underlying parser library to interpret the markup structure and build a navigable tree. You tell BeautifulSoup which parser to use via the second argument in its constructor. Common choices include:
'html.parser'
: Python's built-in HTML parser. Decent speed, moderately lenient.'lxml'
: A very fast and lenient HTML parser based on C libraries. Requires separate installation.'html5lib'
: An extremely lenient parser that aims to mimic how web browsers handle malformed HTML. Creates valid HTML5. Requires separate installation, generally slower thanlxml
.'xml'
: Useslxml
's XML parser. Requireslxml
installation.
The FeatureNotFound
error occurs when BeautifulSoup tries to find and use the parser library you requested (e.g., lxml
) but cannot find it installed in the current Python environment.
Cause: Requested Parser Library Not Installed
The direct cause of the error Couldn't find a tree builder with the features you requested: lxml
is that you specified "lxml"
as the parser, but the lxml
Python package is not installed.
from bs4 import BeautifulSoup
html_doc = "<html><head><title>Test</title></head><body><p>Hello</p></body></html>"
try:
# ⛔️ bs4.FeatureNotFound: Couldn't find a tree builder with the
# features you requested: lxml. Do you need to install a parser library?
# Attempting to use 'lxml' without having it installed.
soup = BeautifulSoup(html_doc, "lxml")
print(soup.title)
except ImportError: # Sometimes it might raise ImportError first depending on setup
print("ImportError caught: lxml library likely missing.")
except Exception as e: # Catch FeatureNotFound specifically if ImportError doesn't trigger
print(f"Caught Error: {e}")
# Check if it's the specific error we expect
if "lxml" in str(e) and "FeatureNotFound" in str(type(e).__name__):
print("--> Confirmed: lxml parser not found.")
Depending on the exact environment and bs4 version, sometimes an ImportError
might occur before FeatureNotFound
, but the root cause – missing library – is the same
The same error structure applies if you specify "html5lib"
or "xml"
without having html5lib
or lxml
installed, respectively.
Solution 1: Use the Built-in html.parser
(No Installation Needed)
The simplest fix, especially if you don't have specific requirements for speed or handling extremely broken HTML, is to use Python's built-in parser. It doesn't require any extra installation beyond Python itself.
from bs4 import BeautifulSoup
html_doc = "<html><head><title>Test</title></head><body><p>Hello</p></body></html>"
# ✅ Use the built-in parser
soup = BeautifulSoup(html_doc, "html.parser")
print("Using html.parser:")
print(f"Title: {soup.title}") # Output: <title>Test</title>
print(f"Title text: {soup.title.string}") # Output: Test
print(f"Paragraph: {soup.p}") # Output: <p>Hello</p>
- Pros: No external dependencies. Reasonably fast. Lenient enough for most well-formed HTML.
- Cons: Not as fast as
lxml
. Not as lenient ashtml5lib
for severely broken HTML.
Solution 2: Install and Use the lxml
Parser (Recommended for Speed)
lxml
is generally the fastest available parser and is quite robust. If performance is important, it's usually the best choice for HTML.
- Install
lxml
: Open your terminal (ensure your virtual environment is activated if using one) and run:pip install lxml
# Or:
pip3 install lxml
# Or:
python -m pip install lxml - Use
lxml
in BeautifulSoup:# solution_lxml.py
from bs4 import BeautifulSoup
html_doc = "<html><head><title>Test</title></head><body><p>Hello</p></body></html>"
# ✅ Use the installed 'lxml' parser
soup = BeautifulSoup(html_doc, "lxml")
print("Using lxml parser:")
print(f"Title: {soup.title}")
print(f"Paragraph: {soup.p}")
- Pros: Very fast. Lenient. Can also parse XML (see section 6).
- Cons: Requires separate installation. It's a C extension, which might occasionally cause installation issues on some systems (though usually straightforward with pip).
Solution 3: Install and Use the html5lib
Parser (Best for Browser-like Parsing)
html5lib
aims to parse HTML exactly as a modern web browser would, meaning it handles malformed markup very gracefully and produces valid HTML5 output. Use this if you need to process "real-world" messy HTML reliably.
- Install
html5lib
:pip install html5lib
# Or: pip3 install html5lib / python -m pip install html5lib etc. - Use
html5lib
in BeautifulSoup:# solution_html5lib.py
from bs4 import BeautifulSoup
# Example with slightly broken HTML that html5lib handles well
html_doc = "<html><head><title>Test<body><p>Hello<p>World"
# ✅ Use the installed 'html5lib' parser
soup = BeautifulSoup(html_doc, "html5lib")
print("Using html5lib parser:")
# html5lib might reconstruct the tree slightly differently
print(soup.prettify()) # Show how it fixed the HTML
print(f"Title: {soup.title}")
print(f"Paragraphs: {soup.find_all('p')}")
- Pros: Extremely lenient (like a browser). Creates valid HTML5 structure.
- Cons: Significantly slower than
lxml
andhtml.parser
. Requires separate installation.
Parsing XML (xml
or lxml-xml
)
If your input document is XML, not HTML, you need to use an XML parser. BeautifulSoup primarily uses lxml
for this.
- Ensure
lxml
is Installed: (See Solution 2). - Specify the XML parser:
# example_xml.py
from bs4 import BeautifulSoup
xml_doc = '<root><item id="1">Value 1</item><item id="2">Value 2</item></root>'
# ✅ Use 'xml' (which defaults to lxml's XML parser)
soup_xml = BeautifulSoup(xml_doc, "xml")
# Or explicitly: s
oup_xml = BeautifulSoup(xml_doc, "lxml-xml")
print("Parsing XML:")
print(soup_xml.find('item', {'id': '2'})) # Output: <item id="2">Value 2</item>
- The built-in
html.parser
andhtml5lib
cannot parse XML correctly. You must havelxml
installed and specify"xml"
or"lxml-xml"
.
Choosing the Right Parser
Feature | html.parser | lxml | html5lib | xml (lxml ) |
---|---|---|---|---|
Type | HTML | HTML | HTML | XML |
Installation | Built-in | pip install lxml | pip install html5lib | pip install lxml |
Speed | Moderate | Very Fast | Slow | Very Fast |
Lenient? | Yes | Very Yes | Extremely Yes | Strict (XML) |
Dependencies | None | C Libraries | Python | C Libraries |
Use Case | Simple HTML, no extra installs | Speed, most HTML, XML | Broken HTML, browser-like | XML documents |
General Recommendation: Start with html.parser
. If you need more speed or leniency for HTML, install and use lxml
. If you need to perfectly mimic browser handling of broken HTML, use html5lib
. For XML, use xml
(requires lxml
).
Conclusion
The bs4.FeatureNotFound: Couldn't find a tree builder...
error means BeautifulSoup cannot locate the parser library you specified (like lxml
or html5lib
).
The solutions are:
- Switch to the built-in parser: Change the second argument to
BeautifulSoup(markup, "html.parser")
. No installation needed. - Install the missing parser: Use
pip install lxml
orpip install html5lib
and keep your original parser choice in theBeautifulSoup
call. - Ensure you use
"xml"
or"lxml-xml"
(and havelxml
installed) if parsing XML documents.
By either installing the required external parser or selecting the appropriate built-in one, you can resolve this error and successfully parse your documents with BeautifulSoup.