GatherHub
Your Personal Internet Archiver

Save important content before it disappears forever. GatherHub automatically ingests and archives web pages, videos, documents, repositories and just about anything else, into your own searchable library. Whether for research, backup, or the end of the world - you'll have access to what matters most.

πŸ” Full-Text Search
πŸ‘οΈ OCR Extraction
πŸ€– Auto-Archive

Your Personal Google

Search across everything you've ever saved - from web pages, spreadsheets & docs, PDFs & eBooks, to videos and images

πŸ”

Full-Text Search

Find any word or phrase across all your archived content instantly

πŸ‘οΈ

OCR Magic

Extract and search text from images, screenshots, and scanned documents

🏷️

Smart Filtering

Filter by file type, source, tags, date, and more with faceted search

πŸ’‘

Content Highlighting

See exactly where your search terms appear with intelligent highlighting

Search Your Archive
πŸ” machine learning algorithms
πŸ“„ Deep Learning Research Paper
...advanced machine learning algorithms for neural networks...
πŸŽ₯ AI Tutorial Video
...introduction to machine learning algorithms and their applications...
πŸ“Έ Conference Slide (OCR)
...comparison of machine learning algorithms performance metrics...

Why GatherHub?

Transform your scattered digital content into an organized, searchable personal archive, the works offline.

Multi-Source Archiving

Automatically import and archive content from browser bookmarks, RSS feeds, databases, and more. Support for Firefox, Chrome, Readeck, Linkding, and many other sources.

Full-Text Search

Search across all your archived content with intelligent indexing. Find information in HTML pages, PDFs, documents, video metadata, and more with faceted filtering.

Unlimited Media Types

Handle any content type with specialized tools. Create custom media types for your unique needs. From streaming videos and git repositories to web pages and documents, GatherHub adapts to your content.

Web Interface

Modern dashboard to manage downloads, search content, and configure sources. Includes dark mode, bulk operations, and comprehensive job tracking.

REST API

Programmatic access for integration with other systems. Full API with authentication, job management, bulk operations, and search capabilities.

Intelligent Processing

Automatic content extraction, metadata preservation, and smart file organization. Event hooks for custom processing and automated workflows.

What Makes GatherHub Unique

Features you won't find anywhere else - designed for the ultimate archiving experience

πŸ“€

Ingest Anything

Not just a downloader - upload local and network files. What you archive and how you archive it is completely configurable. From manual uploads, random links, 3rd party tools, and other sources.

πŸ”§

Extensible with Event Hooks

Highly extensible; write custom event hooks in Python, Bash, Node.js, or any language. Trigger on events for unlimited automation possibilities, including chaining events together.

🌐

Offline-First Design

Everything works offline: search, documentation, uploading, tagging, the entire UI. Only internet downloading requires connectivity.

πŸ”—

Integration Hub, Not Replacement

Designed NOT to be a bookmark manager. Use the tools you love and point GatherHub at them for seamless integration.

🎯

Highly Extensible Architecture

Custom tooling configuration, APIs, and event hooks let you integrate or chain GatherHub with other tools for unlimited possibilities.

πŸ‘οΈ

Text Extraction

Extract searchable text from images, comic books, scanned documents, and screenshots. Same with Word docs, PDFs, eBooks, and other formats. If it's not supported the configuration flexibilty allows you to extend and customize.

🐧

Small Footprint

Written in the highly performant Go language, the single binary does it all. Install on a RaspberyPi, your laptop, NAS, or dedicated server.

πŸ•·οΈ

Web Crawling & Scraping

For the web site archivers, grab a single page or crawl and archive an entire site.

πŸ”–

Works alongside existing tools

Designed to not compete with bookmark managers; keep using your favorite tools and add it as a source for GatherHub to ingest on a schedule. Currently supporting popular tools such as Readeck, Linkding, Wallabag, and LinkWarden.

Connect Your Existing Tools

GatherHub integrates with your current workflow - no need to change how you work

🌐

Browser Bookmarks

Automatically scan and import from major browsers

Firefox Chrome Brave Vivaldi Chromium
πŸ”–

Bookmark Managers

Connect to popular bookmarking and read-later services

πŸ—„οΈ

Databases

Import from existing databases and data stores

SQLite MySQL PostgreSQL
πŸ“‘

Other Sources

Monitor RSS feeds and specialized data sources

RSS Feeds WROLPI
πŸ“€

Ingest

Scheduled ingestion from sources is only one aspect. Manually add content via file uploads and pasting in URLs

URLs File Upload Bulk Import
πŸ’‘

Not a Bookmark Manager

GatherHub doesn't replace your tools - it enhances them by archiving the content they point to

Keep using what you love!

Out of the box Media Types

Specialized handling for every type of content you encounter

πŸŽ₯

Streaming Videos

YouTube, Vimeo, Twitch, TikTok, and more using yt-dlp with metadata extraction and sponsorblock integration.

🌐

Web Pages

Full HTML archiving with JavaScript support via monolith or SingleFile, preserving complete page functionality.

πŸ“„

Documents

PDF, DOCX, TXT, and other document formats with automatic text extraction and indexing.

πŸ“š

E-books

EPUB, MOBI, AZW, and other e-book formats with metadata preservation and content extraction.

πŸ”§

Git Repositories

Clone repositories and create optional ZIP archives for complete project preservation.

πŸ—‚οΈ

Archives

ZIP, RAR, 7z, Tar, Bz and other compressed formats.

🎡

Media Files

MP3, MP4, images, and other media formats.

🧲

Torrents & Magnets

Download torrents and magnet links.

πŸ—ΊοΈ

Maps & ZIMs

Offline maps, Wikipedia ZIM files, and other compressed knowledge archives for offline access to vast information.

πŸ“±

Mobile Apps & APKs

Archive mobile applications, APK files, and software packages.

βš™οΈ

Custom Types

Define your own media types with custom tools and URL patterns. Completely configurable - what you download and how is entirely up to you.

See GatherHub in Action

Explore the real interface and features that make content archiving effortless

Technical Capabilities

Built for power users and developers who need advanced functionality

REST API

Complete programmatic access with authentication, job management, bulk operations, and search endpoints.

Authentication Job Management Bulk Operations Search API

Event Hooks

Run custom scripts on download events for automated processing, notifications, and workflow integration.

Pre/Post Download Error Handling Custom Scripts JSON Context

Content Extraction

Pluggable extractor architecture with support for internal, external, and chained processing pipelines.

Multiple Formats OCR Support Chain Processing Custom Extractors

Advanced Search

Faceted search with highlighting, field-specific queries, and comprehensive indexing of all content types.

Faceted Filtering Content Highlighting Field Search Auto Indexing

Speed & Portability

Written in Go, the binary is compiled for multiple architectures and is self-contained. Only 3rd party tools you want to leverage are required, we recommend a few, but you are free to adapt to your needs.

Multi-platform Single Binary Concurrency Light-weight

Get Started with GatherHub

Free, open source, and ready to transform your content management

Quick Installation

# Download GatherHub
wget https://github.com/optionalsoftware/gatherhub/releases/latest

# Setup services
./gatherhub
cd deploy
./setup-services.sh

Download GatherHub

Get the latest version and start archiving your content today.

Download Now

View Documentation

Comprehensive guides for installation, configuration, and advanced usage.

Read Documentation

Ready to Organize Your Digital Life?

Join thousands of users who have transformed their scattered bookmarks and content into a powerful, searchable personal archive.