1
Building RAG Applications with LangChain
Gratuito
Discover how to integrate external data sources into chat models with LangChain. Learn how to load, split, embed, store, and retrieve data for use in LLM applications.
2
Improving the RAG Architecture
Discover state-of-the-art techniques for loading, splitting, and retrieving documents, including loading Python files, splitting semantically, and using MRR and self-query retrieval methods. Learn to evaluate your RAG architecture using robust metrics and frameworks.
3
Introduction to Graph RAG
Discover how graph databases and retrieval can overcome some of the limitations of traditional vector-based storage and retrieval.

Initializing

Loading HTML files for RAG

It's possible to load documents from many different formats, including complex formats like HTML.

If you're not familiar with HTML, it's a markup language for creating web pages. Here's a small example:

<!DOCTYPE html>
<html>
<body>
  <h2>Heading</h2>
  <p>Here's some text and an image below:</p>
  <img src="image.jpg" alt="..." width="104" height="142">
</body>
</html>

In this exercise, you'll load an HTML file taken containing a DataCamp blog post webpage. The necessary classes have already been imported for you.

Use the UnstructuredHTMLLoader class to load the datacamp-blog.html file in the current directory.
Load the documents into memory.
Print the first document's content.
Print the first document's metadata.