Data Sources

GatherHub can automatically scan various data sources for URLs to download. This page explains how to configure and use different data sources to populate your download queue.

Types of Sources

GatherHub supports several types of data sources:

Source Type	Description	Common Use Cases
SQLite Databases	SQLite database files containing URLs	Browser bookmarks, custom URL collections
Browser Bookmarks	Direct integration with browser bookmark databases	Firefox, Chrome bookmarks
Manual Entry	URLs manually added through the web interface or API	Ad-hoc archiving, one-off downloads

Source Configuration

Data sources are configured in the config.toml file or through the Settings page in the web interface:

[[sources]]
name = "Firefox Bookmarks"
type = "sqlite"
path = "./testDbs/places.sqlite"
table = "moz_bookmarks"
id_column = "id"
url_column = "url"
title_column = "title"
browser = "firefox"
profile_path = "~/.mozilla/firefox/default"

Each source requires several parameters to be configured:

Required Parameters

name: A unique, descriptive name for the source
type: The type of source (sqlite, mysql, postgres)
path: The path to the database file or connection string
table: The database table containing the URLs
id_column: The column containing unique identifiers
url_column: The column containing the URLs to download

Optional Parameters

title_column: The column containing a title or description (if available)
browser: The browser type for browser sources (firefox, chrome)
profile_path: The path to the browser profile (for browser sources)

Browser Integration

GatherHub has special support for importing browser bookmarks:

Firefox Bookmarks

Firefox stores bookmarks in a SQLite database called places.sqlite. To configure:

Go to Settings > Sources
Click "Add Source"
Select "SQLite" as the source type
Configure these parameters:

name = "Firefox Bookmarks"
type = "sqlite"
path = "~/.mozilla/firefox/XXXXXXXX.default/places.sqlite"
table = "moz_bookmarks"
id_column = "id"
url_column = "url"
title_column = "title"
browser = "firefox"

Note: The path will vary depending on your operating system and Firefox profile. Look in your Firefox profile directory for the places.sqlite file.

Chrome/Chromium Bookmarks

Chrome/Chromium bookmarks are stored in a JSON file, but GatherHub can also import them by setting up the correct source configuration:

name = "Chrome Bookmarks"
type = "sqlite"
path = "~/.config/google-chrome/Default/Bookmarks"
browser = "chrome"

Note: Chrome bookmark paths vary by operating system:

Linux: ~/.config/google-chrome/Default/Bookmarks
macOS: ~/Library/Application Support/Google/Chrome/Default/Bookmarks
Windows: %LOCALAPPDATA%\Google\Chrome\User Data\Default\Bookmarks

Custom SQLite Sources

You can also create custom SQLite sources by:

Creating a SQLite database with a table containing URLs
Configuring GatherHub to scan this database

Example schema for a custom source:

CREATE TABLE urls (
    id INTEGER PRIMARY KEY,
    url TEXT NOT NULL,
    title TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Scanning Sources

Once sources are configured, you can scan them to populate the download queue:

Manual Scanning

To manually scan sources:

From the Dashboard, click "Scan Sources" in the Quick Actions section
Alternatively, go to Settings > Sources and click "Scan Sources"

You can also use the API:

POST /api/scan

Automated Scanning

If scheduling is enabled, sources will be automatically scanned at the configured interval:

[scheduling]
enabled = true
interval_minutes = 60

How Scanning Works

When scanning sources, GatherHub:

Connects to each configured source
Queries for URLs that haven't already been added to the queue
Adds new URLs as pending jobs
Logs the results of the scanning operation

GatherHub keeps track of which URLs have already been imported to avoid duplicates.

Troubleshooting

Source Connection Testing

You can test source connections through the web interface:

Go to Settings > Sources
Click "Test Connection" for the source

This will verify that GatherHub can connect to the database and read the required columns.

Common Issues

Problem	Possible Cause	Solution
Database file not found	Incorrect path or missing file	Verify the file path and ensure the file exists
Permission denied	Insufficient permissions to read the database	Check file permissions and ensure GatherHub has read access
Table not found	Incorrect table name	Verify the table name in the database
Column not found	Incorrect column names	Verify the column names in the database
No new URLs found	All URLs already imported	Add new bookmarks to the source and scan again

Checking Logs

Source scanning issues are logged to:

./data/logs/app.log

Look for entries containing "scan" or "source" to diagnose issues:

grep -i "scan\|source" data/logs/app.log