🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
-
Updated
Jun 10, 2024 - Python
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Core Python Web Archiving Toolkit for replay and recording of web archives
Collect and revisit web pages.
The repository and website hosting the peer review process for new Programming Historian lessons
Run a high-fidelity browser-based crawler in a single Docker container
A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!
Streaming WARC/ARC library for fast web archive IO
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
Serverless replay of web archives directly in the browser
Automatically archive links to videos, images, and social media content from Google Sheets (and more).
Archiveror will help you preserve the webpages you love. 💾
A Tool To Push Web Resources Into Web Archives
Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)
InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS
ODU Web Science and Digital Libraries Research Group (WS-DL) home page.
🐋 Web Archiving Integration Layer: One-Click User Instigated Preservation
Wayback Machine API interface & a command-line tool
A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
Add a description, image, and links to the web-archiving topic page so that developers can more easily learn about it.
To associate your repository with the web-archiving topic, visit your repo's landing page and select "manage topics."