ComenzarEmpieza gratis

Loading HTML files for RAG

It's possible to load documents from many different formats, including complex formats like HTML.

If you're not familiar with HTML, it's a markup language for creating web pages. Here's a small example:

<!DOCTYPE html>
<html>
<body>
  <h2>Heading</h2>
  <p>Here's some text and an image below:</p>
  <img src="image.jpg" alt="..." width="104" height="142">
</body>
</html>

In this exercise, you'll load an HTML file taken containing a DataCamp blog post webpage. The necessary classes have already been imported for you.

Este ejercicio forma parte del curso

Retrieval Augmented Generation (RAG) with LangChain

Ver curso

Instrucciones del ejercicio

  • Use the UnstructuredHTMLLoader class to load the datacamp-blog.html file in the current directory.
  • Load the documents into memory.
  • Print the first document's page content.
  • Print the first document's metadata.

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

# Create a document loader for unstructured HTML
loader = ____

# Load the document
data = ____

# Print the first document's content
print(____)

# Print the first document's metadata
print(____)
Editar y ejecutar código