GatherHub supports a variety of media types for archiving out of the box, each with specialized handling to ensure optimal storage and preservation.
Media Type | Description | Storage Location | File Extensions |
---|---|---|---|
HTML | Complete web pages with resources. Pages are saved with all associated CSS, JavaScript, and images to ensure they can be viewed offline exactly as they appeared online. GatherHub supports both monolith and SingleFile for HTML archiving. | /downloads/html/ |
.html , .htm |
Web Archive | Comprehensive website archives with automatic crawling, link rewriting, and navigation indexes. Creates complete offline-browsable copies of websites with discovered internal pages and resources. Includes detailed metadata, crawl statistics, and site structure preservation. | /downloads/web-archive/ |
Directory structure with index.html and archived pages |
Image | Image files across various formats. High-resolution images are preserved in their original format to maintain quality. | /downloads/images/ |
.jpg , .jpeg , .png , .gif , .webp , .svg |
Videos | Video content from various sources. Videos can be downloaded in their native format or optionally transcoded to a more storage-efficient format. | /downloads/video/ |
.mp4 , .webm , .mov , .avi , .mkv |
Audio | Audio files including music, podcasts, and other sound recordings. Audio is stored in its original format with metadata preserved. | /downloads/audio/ |
.mp3 , .wav , .ogg , .flac , .m4a |
Documents | Document files including PDFs, office documents, and text files. Documents are stored with their original structure and formatting. | /downloads/documents/ |
.pdf , .docx , .xlsx , .pptx , .txt |
Books | E-books and publications in various formats. Books are stored with their metadata and cover images when available. | /downloads/books/ |
.epub , .mobi , .azw , .pdf |
Streaming Videos | Videos from streaming platforms like YouTube, Vimeo, Twitch, Facebook, Instagram, Twitter, TikTok, and many others. Captures video with extensive metadata, thumbnails, video descriptions, subtitles, and more using yt-dlp. Features sponsorblock integration to remove ads, geo-bypass capabilities, and automatic quality selection. | /downloads/streaming-video/ |
Various, typically .mp4 , .webm , .mkv |
Torrents | Magnet links and torrent files are supported. | /downloads/torrents/ |
Various |
Git | Git repositories for source code archiving. Repositories are cloned with full history and can be updated incrementally. | /downloads/git/ |
.git |
Archive | Compressed archives of various formats. Archives are stored as-is without extraction unless specifically requested. | /downloads/archives/ |
.zip , .tar.gz , .rar , .7z |
Map | Map data from various sources. Maps are stored as vector data when possible, or as image tiles otherwise. | /downloads/maps/ |
.pbf .mbtiles , .geojson |
ZIM | ZIM files for offline Wikipedia and other wiki content. Full encyclopedia archives in a compressed searchable format. | /downloads/zim/ |
.zim |
/downloads/html/
.html
, .htm
/downloads/web-archive/
index.html
and archived pages
/downloads/images/
.jpg
, .jpeg
, .png
, .gif
, .webp
, .svg
/downloads/video/
.mp4
, .webm
, .mov
, .avi
, .mkv
/downloads/audio/
.mp3
, .wav
, .ogg
, .flac
, .m4a
/downloads/documents/
.pdf
, .docx
, .xlsx
, .pptx
, .txt
/downloads/books/
.epub
, .mobi
, .azw
, .pdf
/downloads/streaming-video/
.mp4
, .webm
, .mkv
/downloads/torrents/
/downloads/git/
.git
/downloads/archives/
.zip
, .tar.gz
, .rar
, .7z
/downloads/maps/
.pbf
.mbtiles
, .geojson
/downloads/zim/
.zim
GatherHub automatically detects the appropriate media type based on:
If automatic detection doesn't yield the desired results, media types can be manually specified when adding jobs through the web interface or API.
git
it will match on gitlab.com AND extension git to
treat it as a repository.
If the URL is to a page on github.com it will likely detect this as HTML and use that media_type.
All downloaded content is stored in subdirectories based on media type, making it easy to browse and manage your archive. The base storage path is configurable in the application settings.
GatherHub supports custom media type handlers through the plugin system. Custom handlers can be developed to support specialized media types or to modify the behavior of existing handlers. For more information, see the Event Hooks documentation.