Makes saving pages in bulk to the wayback machine much easier
-
Updated
Jun 13, 2024 - HTML
Makes saving pages in bulk to the wayback machine much easier
Serverless replay of web archives directly in the browser
The repository and website hosting the peer review process for new Programming Historian lessons
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS
Run a high-fidelity browser-based crawler in a single Docker container
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!
A suite of tools for mirroring and hoarding web pages you visit for later offline viewing. I.e. your own personal Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data, which also follows "archive everything now, figure out what to do with it later" philosophy.
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
Ed course archiver and viewer
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Streaming WARC/ARC library for fast web archive IO
A Memento Aggregator CLI and Server in Go
Official Python package for ArchiveBox, the self-hosted internet archiving solution.
Add a description, image, and links to the web-archiving topic page so that developers can more easily learn about it.
To associate your repository with the web-archiving topic, visit your repo's landing page and select "manage topics."