Aan de slagGa gratis aan de slag

Loading HTML files for RAG

It's possible to load documents from many different formats, including complex formats like HTML.

If you're not familiar with HTML, it's a markup language for creating web pages. Here's a small example:

<!DOCTYPE html>
<html>
<body>
  <h2>Heading</h2>
  <p>Here's some text and an image below:</p>
  <img src="image.jpg" alt="..." width="104" height="142">
</body>
</html>

In this exercise, you'll load an HTML file taken containing a DataCamp blog post webpage. The necessary classes have already been imported for you.

Deze oefening maakt deel uit van de cursus

Retrieval Augmented Generation (RAG) with LangChain

Cursus bekijken

Oefeninstructies

  • Use the UnstructuredHTMLLoader class to load the datacamp-blog.html file in the current directory.
  • Load the documents into memory.
  • Print the first document's page content.
  • Print the first document's metadata.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Create a document loader for unstructured HTML
loader = ____

# Load the document
data = ____

# Print the first document's content
print(____)

# Print the first document's metadata
print(____)
Code bewerken en uitvoeren