Skip to content

crawlab-team/crawlab-ai-sdk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crawlab AI SDK

This is the Python SDK for Crawlab AI, an AI-powered web scraping platform maintained by Crawlab.

Installation

pip install crawlab-ai

Pre-requisites

An API token is required to use this SDK. You can get the API token from the Crawlab official website.

Usage

Get data from a list page

from crawlab_ai import read_list

# Define the URL and fields
url = "https://example.com"

# Get the data without specifying fields
df = read_list(url=url)
print(df)

# You can also specify fields
fields = ["title", "content"]
df = read_list(url=url, fields=fields)

# You can also return a list of dictionaries instead of a DataFrame
data = read_list(url=url, as_dataframe=False)
print(data)

Usage with Scrapy

Create a Scrapy spider by extending ScrapyListSpider:

from crawlab_ai import ScrapyListSpider


class MySpider(ScrapyListSpider):
    name = "my_spider"
    start_urls = ["https://example.com"]
    fields = ["title", "content"]

Then run the spider:

scrapy crawl my_spider

Releases

No releases published

Packages

No packages published

Languages