Data Sources

GatherHub can automatically scan various data sources for URLs to download. This page explains how to configure and use different data sources to populate your download queue.

Types of Sources

GatherHub supports several types of data sources:

Source Type Description Common Use Cases
SQLite Databases SQLite database files containing URLs Browser bookmarks, custom URL collections
Browser Bookmarks Direct integration with browser bookmark databases Firefox, Chrome bookmarks
Manual Entry URLs manually added through the web interface or API Ad-hoc archiving, one-off downloads

Source Configuration

Data sources are configured in the config.toml file or through the Settings page in the web interface:

[[sources]]
name = "Firefox Bookmarks"
type = "sqlite"
path = "./testDbs/places.sqlite"
table = "moz_bookmarks"
id_column = "id"
url_column = "url"
title_column = "title"
browser = "firefox"
profile_path = "~/.mozilla/firefox/default"

Each source requires several parameters to be configured:

Required Parameters

  • name: A unique, descriptive name for the source
  • type: The type of source (sqlite, mysql, postgres)
  • path: The path to the database file or connection string
  • table: The database table containing the URLs
  • id_column: The column containing unique identifiers
  • url_column: The column containing the URLs to download

Optional Parameters

  • title_column: The column containing a title or description (if available)
  • browser: The browser type for browser sources (firefox, chrome)
  • profile_path: The path to the browser profile (for browser sources)

Browser Integration

GatherHub has special support for importing browser bookmarks:

Firefox Bookmarks

Firefox stores bookmarks in a SQLite database called places.sqlite. To configure:

  1. Go to Settings > Sources
  2. Click "Add Source"
  3. Select "SQLite" as the source type
  4. Configure these parameters:
name = "Firefox Bookmarks"
type = "sqlite"
path = "~/.mozilla/firefox/XXXXXXXX.default/places.sqlite"
table = "moz_bookmarks"
id_column = "id"
url_column = "url"
title_column = "title"
browser = "firefox"
Note: The path will vary depending on your operating system and Firefox profile. Look in your Firefox profile directory for the places.sqlite file.

Chrome/Chromium Bookmarks

Chrome/Chromium bookmarks are stored in a JSON file, but GatherHub can also import them by setting up the correct source configuration:

name = "Chrome Bookmarks"
type = "sqlite"
path = "~/.config/google-chrome/Default/Bookmarks"
browser = "chrome"
Note: Chrome bookmark paths vary by operating system:
  • Linux: ~/.config/google-chrome/Default/Bookmarks
  • macOS: ~/Library/Application Support/Google/Chrome/Default/Bookmarks
  • Windows: %LOCALAPPDATA%\Google\Chrome\User Data\Default\Bookmarks

Custom SQLite Sources

You can also create custom SQLite sources by:

  1. Creating a SQLite database with a table containing URLs
  2. Configuring GatherHub to scan this database

Example schema for a custom source:

CREATE TABLE urls (
    id INTEGER PRIMARY KEY,
    url TEXT NOT NULL,
    title TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Scanning Sources

Once sources are configured, you can scan them to populate the download queue:

Manual Scanning

To manually scan sources:

  1. From the Dashboard, click "Scan Sources" in the Quick Actions section
  2. Alternatively, go to Settings > Sources and click "Scan Sources"

You can also use the API:

POST /api/scan

Automated Scanning

If scheduling is enabled, sources will be automatically scanned at the configured interval:

[scheduling]
enabled = true
interval_minutes = 60

How Scanning Works

When scanning sources, GatherHub:

  1. Connects to each configured source
  2. Queries for URLs that haven't already been added to the queue
  3. Adds new URLs as pending jobs
  4. Logs the results of the scanning operation

GatherHub keeps track of which URLs have already been imported to avoid duplicates.

Troubleshooting

Source Connection Testing

You can test source connections through the web interface:

  1. Go to Settings > Sources
  2. Click "Test Connection" for the source

This will verify that GatherHub can connect to the database and read the required columns.

Common Issues

Problem Possible Cause Solution
Database file not found Incorrect path or missing file Verify the file path and ensure the file exists
Permission denied Insufficient permissions to read the database Check file permissions and ensure GatherHub has read access
Table not found Incorrect table name Verify the table name in the database
Column not found Incorrect column names Verify the column names in the database
No new URLs found All URLs already imported Add new bookmarks to the source and scan again

Checking Logs

Source scanning issues are logged to:

./data/logs/app.log

Look for entries containing "scan" or "source" to diagnose issues:

grep -i "scan\|source" data/logs/app.log

Related Documentation

Search Results

Type to search documentation...