Media Types

GatherHub supports a variety of media types for archiving out of the box, each with specialized handling to ensure optimal storage and preservation.

Note: Media types are automatically detected from URLs, but can be manually specified when adding jobs through the web interface.
On mobile, table data is shown as individual cards for better readability.
Media Type Description Storage Location File Extensions
HTML Complete web pages with resources. Pages are saved with all associated CSS, JavaScript, and images to ensure they can be viewed offline exactly as they appeared online. GatherHub supports both monolith and SingleFile for HTML archiving. /downloads/html/ .html, .htm
Web Archive Comprehensive website archives with automatic crawling, link rewriting, and navigation indexes. Creates complete offline-browsable copies of websites with discovered internal pages and resources. Includes detailed metadata, crawl statistics, and site structure preservation. /downloads/web-archive/ Directory structure with index.html and archived pages
Image Image files across various formats. High-resolution images are preserved in their original format to maintain quality. /downloads/images/ .jpg, .jpeg, .png, .gif, .webp, .svg
Videos Video content from various sources. Videos can be downloaded in their native format or optionally transcoded to a more storage-efficient format. /downloads/video/ .mp4, .webm, .mov, .avi, .mkv
Audio Audio files including music, podcasts, and other sound recordings. Audio is stored in its original format with metadata preserved. /downloads/audio/ .mp3, .wav, .ogg, .flac, .m4a
Documents Document files including PDFs, office documents, and text files. Documents are stored with their original structure and formatting. /downloads/documents/ .pdf, .docx, .xlsx, .pptx, .txt
Books E-books and publications in various formats. Books are stored with their metadata and cover images when available. /downloads/books/ .epub, .mobi, .azw, .pdf
Streaming Videos Videos from streaming platforms like YouTube, Vimeo, Twitch, Facebook, Instagram, Twitter, TikTok, and many others. Captures video with extensive metadata, thumbnails, video descriptions, subtitles, and more using yt-dlp. Features sponsorblock integration to remove ads, geo-bypass capabilities, and automatic quality selection. /downloads/streaming-video/ Various, typically .mp4, .webm, .mkv
Torrents Magnet links and torrent files are supported. /downloads/torrents/ Various
Git Git repositories for source code archiving. Repositories are cloned with full history and can be updated incrementally. /downloads/git/ .git
Archive Compressed archives of various formats. Archives are stored as-is without extraction unless specifically requested. /downloads/archives/ .zip, .tar.gz, .rar, .7z
Map Map data from various sources. Maps are stored as vector data when possible, or as image tiles otherwise. /downloads/maps/ .pbf.mbtiles, .geojson
ZIM ZIM files for offline Wikipedia and other wiki content. Full encyclopedia archives in a compressed searchable format. /downloads/zim/ .zim
HTML
Description: Complete web pages with resources. Pages are saved with all associated CSS, JavaScript, and images to ensure they can be viewed offline exactly as they appeared online. GatherHub supports both monolith and SingleFile for HTML archiving.
Storage: /downloads/html/
Extensions: .html, .htm
Web Archive
Description: Comprehensive website archives with automatic crawling, link rewriting, and navigation indexes. Creates complete offline-browsable copies of websites with discovered internal pages and resources. Includes detailed metadata, crawl statistics, and site structure preservation.
Storage: /downloads/web-archive/
Extensions: Directory structure with index.html and archived pages
Image
Description: Image files across various formats. High-resolution images are preserved in their original format to maintain quality.
Storage: /downloads/images/
Extensions: .jpg, .jpeg, .png, .gif, .webp, .svg
Videos
Description: Video content from various sources. Videos can be downloaded in their native format or optionally transcoded to a more storage-efficient format.
Storage: /downloads/video/
Extensions: .mp4, .webm, .mov, .avi, .mkv
Audio
Description: Audio files including music, podcasts, and other sound recordings. Audio is stored in its original format with metadata preserved.
Storage: /downloads/audio/
Extensions: .mp3, .wav, .ogg, .flac, .m4a
Documents
Description: Document files including PDFs, office documents, and text files. Documents are stored with their original structure and formatting.
Storage: /downloads/documents/
Extensions: .pdf, .docx, .xlsx, .pptx, .txt
Books
Description: E-books and publications in various formats. Books are stored with their metadata and cover images when available.
Storage: /downloads/books/
Extensions: .epub, .mobi, .azw, .pdf
Streaming Videos
Description: Videos from streaming platforms like YouTube, Vimeo, Twitch, Facebook, Instagram, Twitter, TikTok, and many others. Captures video with extensive metadata, thumbnails, video descriptions, subtitles, and more using yt-dlp. Features sponsorblock integration to remove ads, geo-bypass capabilities, and automatic quality selection.
Storage: /downloads/streaming-video/
Extensions: Various, typically .mp4, .webm, .mkv
Torrents
Description: Magnet links and torrent files are supported.
Storage: /downloads/torrents/
Extensions: Various
Git
Description: Git repositories for source code archiving. Repositories are cloned with full history and can be updated incrementally.
Storage: /downloads/git/
Extensions: .git
Archive
Description: Compressed archives of various formats. Archives are stored as-is without extraction unless specifically requested.
Storage: /downloads/archives/
Extensions: .zip, .tar.gz, .rar, .7z
Map
Description: Map data from various sources. Maps are stored as vector data when possible, or as image tiles otherwise.
Storage: /downloads/maps/
Extensions: .pbf.mbtiles, .geojson
ZIM
Description: ZIM files for offline Wikipedia and other wiki content. Full encyclopedia archives in a compressed searchable format.
Storage: /downloads/zim/
Extensions: .zim

Media Type Detection

GatherHub automatically detects the appropriate media type based on:

  • URL pattern (e.g., youtube.com URLs are treated as YouTube media type)
  • Content-Type HTTP header from the server response
  • File extension in the URL path
  • Initial content analysis for ambiguous cases

If automatic detection doesn't yield the desired results, media types can be manually specified when adding jobs through the web interface or API.

GatherHub will use domain patterns AND extensions in combination if both are present in the media_type configuration. As an example for git it will match on gitlab.com AND extension git to treat it as a repository. If the URL is to a page on github.com it will likely detect this as HTML and use that media_type.

Storage Organization

All downloaded content is stored in subdirectories based on media type, making it easy to browse and manage your archive. The base storage path is configurable in the application settings.

Storage Consideration: Video and audio files can consume significant storage space. Consider enabling the storage quotas feature in the Settings page to prevent unintended storage exhaustion.

Custom Media Type Handlers

GatherHub supports custom media type handlers through the plugin system. Custom handlers can be developed to support specialized media types or to modify the behavior of existing handlers. For more information, see the Event Hooks documentation.

Search Results

Type to search documentation...