Skip to content
@bottomless-archive-project

Bottomless Archive Project

A project about archiving anything that's available digitally.

Pinned

  1. library-of-alexandria library-of-alexandria Public

    Library of Alexandria (LoA in short) is a project that aims to collect and archive documents from the internet.

    Java 108 2

  2. url-collector url-collector Public

    An application that crawls the Common Crawl corpus for URLs with the specified file extensions.

    Java

  3. file-collector file-collector Public

    Java

  4. document-location-database document-location-database Public

  5. java-warc java-warc Public

    Forked from laxika/java-warc

    Read Web ARChive (WARC) files in Java.

    Java 5

  6. common-crawl-client common-crawl-client Public

    This library is a very lightweight client to Common Crawl's WARC files.

    Java

Repositories

Showing 7 of 7 repositories

Top languages

Loading…

Most used topics

Loading…