#

web-archiving

Here are 109 public repositories matching this topic...

ArchiveBox

ArchiveBox / ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Updated Jun 10, 2024
Python

webrecorder / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives

python web-archiving wayback web-archives pywb

Updated May 6, 2024
JavaScript

conifer

Rhizome-Conifer / conifer

Collect and revisit web pages.

python docker archives warc web-archiving wayback webrecorder pywb

Updated Nov 8, 2023
Python

programminghistorian / ph-submissions

The repository and website hosting the peer review process for new Programming Historian lessons

python api open-source mapping multi-lingual web-scraping digital-humanities data-management pedagogy web-archiving network-analysis linked-open-data programming-historian dh open-educational-resources r-studio digital-history distant-reading

Updated Jun 12, 2024
Jupyter Notebook

webrecorder / browsertrix-crawler

Run a high-fidelity browser-based crawler in a single Docker container

crawler web-crawler crawling warc web-archiving webrecorder wacz

Updated Jun 12, 2024
TypeScript

harvard-lil / perma

Indelible links

libraries web-archiving

Updated Jun 12, 2024
JavaScript

webrecorder / archiveweb.page

A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!

extension archiving chromium web-archiving webrecorder wacz

Updated Jun 10, 2024
JavaScript

webrecorder / warcio

Streaming WARC/ARC library for fast web archive IO

python warc web-archiving web-archives pywb

Updated May 27, 2024
Python

gildas-lormeau / single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

nodejs cli web-scraper web-scraping web-archiving single-file deno

Updated Jun 5, 2024
JavaScript

webrecorder / replayweb.page

Serverless replay of web archives directly in the browser

service-worker warc web-archiving wayback-machine web-archive replay-web-page web-replay wacz

Updated Jun 12, 2024
TypeScript

bellingcat / auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

python docker service scraping archive web-archiving open-source-research

Updated Jun 11, 2024
Python

rahiel / archiveror

Archiveror will help you preserve the webpages you love. 💾

javascript chrome-extension bookmark archiving webextension firefox-extension browser-extension mhtml linkrot web-archiving

Updated Oct 18, 2019
JavaScript

oduwsdl / archivenow

A Tool To Push Web Resources Into Web Archives

internet-archive web-archiving

Updated Jan 23, 2024
Python

webrecorder / webrecorder-player

Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)

electron warc web-archiving webrecorder pywb

Updated Sep 17, 2020
JavaScript

ipwb

oduwsdl / ipwb

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS

python docker service-worker ipfs memento warc web-archiving wayback memento-rfc

Updated Jun 12, 2024
Python

oduwsdl / oduwsdl.github.io

ODU Web Science and Digital Libraries Research Group (WS-DL) home page.

machine-learning natural-language-processing information-retrieval web-science web-archiving digital-preservation digital-libraries

Updated May 18, 2024
HTML

wail

machawk1 / wail

🐋 Web Archiving Integration Layer: One-Click User Instigated Preservation

python gui warc web-archiving pyinstaller wayback heritrix openwayback

Updated May 16, 2024
Roff

waybackpy

akamhy / waybackpy

Wayback Machine API interface & a command-line tool

osint internet-archive web-archiving wayback-machine webarchiving cdx-api internet-archiving savepagenow archive-webpage archive-webpages wayback-machine-api wayback-machine-python

Updated Feb 26, 2024
Python

cocrawler / cdx_toolkit

A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine

python warc web-archiving cdx web-archives commoncrawl cdx-api

Updated May 20, 2024
Python

webrecorder / browsertrix

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

kubernetes cloud archiving warc web-archiving webrecorder web-archive wacz

Updated Jun 13, 2024
TypeScript

Improve this page

Add a description, image, and links to the web-archiving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-archiving topic, visit your repo's landing page and select "manage topics."