#

web-archiving

Here are 109 public repositories matching this topic...

ArchivingToolsForWBM / AdvancedInternetArchiving

Makes saving pages in bulk to the wayback machine much easier

web-archiving webarchiving

Updated Jun 13, 2024
HTML

harvard-lil / perma

Indelible links

libraries web-archiving

Updated Jun 12, 2024
JavaScript

webrecorder / replayweb.page

Serverless replay of web archives directly in the browser

service-worker warc web-archiving wayback-machine web-archive replay-web-page web-replay wacz

Updated Jun 12, 2024
TypeScript

programminghistorian / ph-submissions

The repository and website hosting the peer review process for new Programming Historian lessons

python api open-source mapping multi-lingual web-scraping digital-humanities data-management pedagogy web-archiving network-analysis linked-open-data programming-historian dh open-educational-resources r-studio digital-history distant-reading

Updated Jun 12, 2024
Jupyter Notebook

webrecorder / browsertrix

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

kubernetes cloud archiving warc web-archiving webrecorder web-archive wacz

Updated Jun 13, 2024
TypeScript

ipwb

oduwsdl / ipwb

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS

python docker service-worker ipfs memento warc web-archiving wayback memento-rfc

Updated Jun 12, 2024
Python

webrecorder / browsertrix-crawler

Run a high-fidelity browser-based crawler in a single Docker container

crawler web-crawler crawling warc web-archiving webrecorder wacz

Updated Jun 12, 2024
TypeScript

ArchiveBox

ArchiveBox / ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Updated Jun 10, 2024
Python

webrecorder / archiveweb.page

A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!

extension archiving chromium web-archiving webrecorder wacz

Updated Jun 10, 2024
JavaScript

aidatorajiro / misc

mysterious box of various codes

game linux raspberry-pi backup math firewall tor proton timestamp wine web-archiving

Updated Jun 8, 2024
Ruby

Own-Data-Privateer / pwebarc

A suite of tools for mirroring and hoarding web pages you visit for later offline viewing. I.e. your own personal Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data, which also follows "archive everything now, figure out what to do with it later" philosophy.

backups internet self-hosted archive web-archiving wayback-machine internet-archiving

Updated Jun 7, 2024
Python

helgeho / ArchiveSpark

An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.

spark internet-archive warc web-archiving webarchive archivespark spark-framework

Updated Jun 5, 2024
Scala

gildas-lormeau / single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

nodejs cli web-scraper web-scraping web-archiving single-file deno

Updated Jun 5, 2024
JavaScript

yuzhoumo / piazzabox

Piazza course archiver and viewer

python piazza web-archiving alpinejs

Updated Jun 2, 2024
Python

yuzhoumo / edbox

Ed course archiver and viewer

python jinja2 web-archiving edstem alpinejs

Updated Jun 2, 2024
Python

nla / pandas4

Web archive workflow system

Updated May 28, 2024
Java

nla / heritrix3

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

Updated May 29, 2024
Java

webrecorder / warcio

Streaming WARC/ARC library for fast web archive IO

python warc web-archiving web-archives pywb

Updated May 27, 2024
Python

MemGator

oduwsdl / MemGator

A Memento Aggregator CLI and Server in Go

memento web-archiving timemap memento-rfc

Updated May 21, 2024
Go

ArchiveBox / pip-archivebox

Official Python package for ArchiveBox, the self-hosted internet archiving solution.

python pypi wheel pip setuptools web-archiving digipres sdist internet-archiving archivebox

Updated May 21, 2024

Improve this page

Add a description, image, and links to the web-archiving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-archiving topic, visit your repo's landing page and select "manage topics."