GatherHub
Your Personal Internet Archiver
Save important content before it disappears forever. GatherHub automatically ingests and archives web pages, videos, documents, repositories and just about anything else, into your own searchable library. Whether for research, backup, or the end of the world - you'll have access to what matters most.

Your Personal Google
Search across everything you've ever saved - from web pages, spreadsheets & docs, PDFs & eBooks, to videos and images
Full-Text Search
Find any word or phrase across all your archived content instantly
OCR Magic
Extract and search text from images, screenshots, and scanned documents
Smart Filtering
Filter by file type, source, tags, date, and more with faceted search
Content Highlighting
See exactly where your search terms appear with intelligent highlighting
Why GatherHub?
Transform your scattered digital content into an organized, searchable personal archive, the works offline.
Multi-Source Archiving
Automatically import and archive content from browser bookmarks, RSS feeds, databases, and more. Support for Firefox, Chrome, Readeck, Linkding, and many other sources.
Full-Text Search
Search across all your archived content with intelligent indexing. Find information in HTML pages, PDFs, documents, video metadata, and more with faceted filtering.
Unlimited Media Types
Handle any content type with specialized tools. Create custom media types for your unique needs. From streaming videos and git repositories to web pages and documents, GatherHub adapts to your content.
Web Interface
Modern dashboard to manage downloads, search content, and configure sources. Includes dark mode, bulk operations, and comprehensive job tracking.
REST API
Programmatic access for integration with other systems. Full API with authentication, job management, bulk operations, and search capabilities.
Intelligent Processing
Automatic content extraction, metadata preservation, and smart file organization. Event hooks for custom processing and automated workflows.
What Makes GatherHub Unique
Features you won't find anywhere else - designed for the ultimate archiving experience
Ingest Anything
Not just a downloader - upload local and network files. What you archive and how you archive it is completely configurable. From manual uploads, random links, 3rd party tools, and other sources.
Extensible with Event Hooks
Highly extensible; write custom event hooks in Python, Bash, Node.js, or any language. Trigger on events for unlimited automation possibilities, including chaining events together.
Offline-First Design
Everything works offline: search, documentation, uploading, tagging, the entire UI. Only internet downloading requires connectivity.
Integration Hub, Not Replacement
Designed NOT to be a bookmark manager. Use the tools you love and point GatherHub at them for seamless integration.
Highly Extensible Architecture
Custom tooling configuration, APIs, and event hooks let you integrate or chain GatherHub with other tools for unlimited possibilities.
Text Extraction
Extract searchable text from images, comic books, scanned documents, and screenshots. Same with Word docs, PDFs, eBooks, and other formats. If it's not supported the configuration flexibilty allows you to extend and customize.
Small Footprint
Written in the highly performant Go language, the single binary does it all. Install on a RaspberyPi, your laptop, NAS, or dedicated server.
Web Crawling & Scraping
For the web site archivers, grab a single page or crawl and archive an entire site.
Works alongside existing tools
Designed to not compete with bookmark managers; keep using your favorite tools and add it as a source for GatherHub to ingest on a schedule. Currently supporting popular tools such as Readeck, Linkding, Wallabag, and LinkWarden.
Connect Your Existing Tools
GatherHub integrates with your current workflow - no need to change how you work
Browser Bookmarks
Automatically scan and import from major browsers
Bookmark Managers
Connect to popular bookmarking and read-later services
Databases
Import from existing databases and data stores
Other Sources
Monitor RSS feeds and specialized data sources
Ingest
Scheduled ingestion from sources is only one aspect. Manually add content via file uploads and pasting in URLs
Not a Bookmark Manager
GatherHub doesn't replace your tools - it enhances them by archiving the content they point to
Out of the box Media Types
Specialized handling for every type of content you encounter
Streaming Videos
YouTube, Vimeo, Twitch, TikTok, and more using yt-dlp with metadata extraction and sponsorblock integration.
Web Pages
Full HTML archiving with JavaScript support via monolith or SingleFile, preserving complete page functionality.
Documents
PDF, DOCX, TXT, and other document formats with automatic text extraction and indexing.
E-books
EPUB, MOBI, AZW, and other e-book formats with metadata preservation and content extraction.
Git Repositories
Clone repositories and create optional ZIP archives for complete project preservation.
Archives
ZIP, RAR, 7z, Tar, Bz and other compressed formats.
Media Files
MP3, MP4, images, and other media formats.
Torrents & Magnets
Download torrents and magnet links.
Maps & ZIMs
Offline maps, Wikipedia ZIM files, and other compressed knowledge archives for offline access to vast information.
Mobile Apps & APKs
Archive mobile applications, APK files, and software packages.
Custom Types
Define your own media types with custom tools and URL patterns. Completely configurable - what you download and how is entirely up to you.
See GatherHub in Action
Explore the real interface and features that make content archiving effortless
Technical Capabilities
Built for power users and developers who need advanced functionality
REST API
Complete programmatic access with authentication, job management, bulk operations, and search endpoints.
Event Hooks
Run custom scripts on download events for automated processing, notifications, and workflow integration.
Content Extraction
Pluggable extractor architecture with support for internal, external, and chained processing pipelines.
Advanced Search
Faceted search with highlighting, field-specific queries, and comprehensive indexing of all content types.
Speed & Portability
Written in Go, the binary is compiled for multiple architectures and is self-contained. Only 3rd party tools you want to leverage are required, we recommend a few, but you are free to adapt to your needs.
Get Started with GatherHub
Free, open source, and ready to transform your content management
Quick Installation
# Download GatherHub
wget https://github.com/optionalsoftware/gatherhub/releases/latest
# Setup services
./gatherhub
cd deploy
./setup-services.sh
View Documentation
Comprehensive guides for installation, configuration, and advanced usage.
Read DocumentationReady to Organize Your Digital Life?
Join thousands of users who have transformed their scattered bookmarks and content into a powerful, searchable personal archive.