1. Learn
  2. /
  3. Courses
  4. /
  5. Retrieval Augmented Generation (RAG) with LangChain

Connected

Exercise

Loading HTML files for RAG

It's possible to load documents from many different formats, including complex formats like HTML.

If you're not familiar with HTML, it's a markup language for creating web pages. Here's a small example:

<!DOCTYPE html>
<html>
<body>
  <h2>Heading</h2>
  <p>Here's some text and an image below:</p>
  <img src="image.jpg" alt="..." width="104" height="142">
</body>
</html>

In this exercise, you'll load an HTML file taken containing a DataCamp blog post webpage. The necessary classes have already been imported for you.

Instructions

100 XP
  • Use the UnstructuredHTMLLoader class to load the datacamp-blog.html file in the current directory.
  • Load the documents into memory.
  • Print the first document's page content.
  • Print the first document's metadata.