Learn the structure of HTML. We begin by explaining why web scraping can be a valuable addition to your data science toolbox and then delving into some basics of HTML. We end the chapter by giving a brief introduction on XPath notation, which is used to navigate the elements within HTML code.

Web Scraping Overview

Web-scraping is not nonsense!

HyperText Markup Language

HTML tree wordy navigation

From Tree to HTML

Attributes

Keep it Classy

Finding href

Crash Course in XPath

Where am I?

It's Time to P

A classy span

Introduction to HTML

Leverage XPath syntax to explore scrapy selectors. Both of these concepts will move you towards being able to scrape an HTML document.

XPatologia

Contagem de elementos em condições desafiadoras

Partes do corpo

Escolha o DataCamp!

Roteiro do XPath

Onde está o @

Verifique sua classe

Hyper(link) ativo

Links secretos

Objetos seletores

Encadeamento no XPath

Divida este exercício

A fonte da fonte

Classe do curso por inspeção

Como solicitar um seletor

XPaths and Selectors

Learn CSS Locator syntax and begin playing with the idea of chaining together CSS Locators with XPath. We also introduce Response objects, which behave like Selectors but give us extra tools to mobilize our scraping efforts across multiple websites. 

From XPath to CSS

The (X)Path to CSS Locators

Get an "a" in this Course

The CSS Wildcard

CSS Attributes and Text Selection

You've been `href`ed

Top Level Text

All Level Text

Respond Please!

Reveal By Response

Responding with Selectors

Selecting from a Selection

Survey

Titular

Scraping with Children

CSS Locators, Chaining, and Responses

Learn to create web crawlers with scrapy. These scrapy spiders will crawl the web through multiple pages, following links to scrape each of those pages automatically according to the procedures we've learned in the previous chapters.

Your First Spider

Inheriting the Spider

Hurl the URLs

Start Requests

Self Referencing is Classy

Starting with Start Requests

Parse and Crawl

Pen Names

Crawler Time

Capstone

Time to Run

DataCamp Descriptions

Capstone Crawler

The Finale

Spiders

DataCamp webpage HTML

A capacidade de criar ferramentas capazes de recuperar e analisar informações armazenadas na internet foi e continua sendo valiosa em muitas áreas da ciência de dados. Neste curso, você aprenderá a navegar e analisar códigos html e a criar ferramentas para rastrear sites automaticamente. Embora nossa raspagem seja realizada usando a versátil biblioteca Python scrapy, muitas das técnicas que você aprenderá neste curso também podem ser aplicadas a outras bibliotecas Python bem conhecidas, como BeautifulSoup e Selenium. Após a conclusão deste curso, você terá um modelo mental eficaz da estrutura html, será capaz de criar ferramentas para analisar códigos html e acessar as informações desejadas, além de criar um spider scrapy simples para rastrear a web em escala.

Intermediate Python

Aprenda a obter e analisar informações da internet usando a biblioteca Python scrapy.

Raspagem da Web em Python

Aprenda a recuperar e analisar informações da Internet usando a biblioteca Python scrapy.

Desenvolvedor Python

Data Scientist, NYU

Navegação com XPath

Barras e colchetes

Com ou sem colchetes?

Um body de p

Os pássaros e os p’s

Barra dupla com colchetes

O curinga

Xposé

https://s3.amazonaws.com/assets.datacamp.com/production/course_37052/subtitles/course_37052_05e66acf8bc0e799ed03d9703ef45842.vtt

XPaths and Selectors - XPatologia

XPatologia

Create Your Free Account