GatherHub can automatically scan various data sources for URLs to download. This page explains how to configure and use different data sources to populate your download queue.
GatherHub supports several types of data sources:
Source Type | Description | Common Use Cases |
---|---|---|
SQLite Databases | SQLite database files containing URLs | Browser bookmarks, custom URL collections |
Browser Bookmarks | Direct integration with browser bookmark databases | Firefox, Chrome bookmarks |
Manual Entry | URLs manually added through the web interface or API | Ad-hoc archiving, one-off downloads |
Data sources are configured in the config.toml
file or through the Settings page in the web interface:
[[sources]]
name = "Firefox Bookmarks"
type = "sqlite"
path = "./testDbs/places.sqlite"
table = "moz_bookmarks"
id_column = "id"
url_column = "url"
title_column = "title"
browser = "firefox"
profile_path = "~/.mozilla/firefox/default"
Each source requires several parameters to be configured:
GatherHub has special support for importing browser bookmarks:
Firefox stores bookmarks in a SQLite database called places.sqlite
. To configure:
name = "Firefox Bookmarks"
type = "sqlite"
path = "~/.mozilla/firefox/XXXXXXXX.default/places.sqlite"
table = "moz_bookmarks"
id_column = "id"
url_column = "url"
title_column = "title"
browser = "firefox"
places.sqlite
file.
Chrome/Chromium bookmarks are stored in a JSON file, but GatherHub can also import them by setting up the correct source configuration:
name = "Chrome Bookmarks"
type = "sqlite"
path = "~/.config/google-chrome/Default/Bookmarks"
browser = "chrome"
~/.config/google-chrome/Default/Bookmarks
~/Library/Application Support/Google/Chrome/Default/Bookmarks
%LOCALAPPDATA%\Google\Chrome\User Data\Default\Bookmarks
You can also create custom SQLite sources by:
Example schema for a custom source:
CREATE TABLE urls (
id INTEGER PRIMARY KEY,
url TEXT NOT NULL,
title TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Once sources are configured, you can scan them to populate the download queue:
To manually scan sources:
You can also use the API:
POST /api/scan
If scheduling is enabled, sources will be automatically scanned at the configured interval:
[scheduling]
enabled = true
interval_minutes = 60
When scanning sources, GatherHub:
GatherHub keeps track of which URLs have already been imported to avoid duplicates.
You can test source connections through the web interface:
This will verify that GatherHub can connect to the database and read the required columns.
Problem | Possible Cause | Solution |
---|---|---|
Database file not found | Incorrect path or missing file | Verify the file path and ensure the file exists |
Permission denied | Insufficient permissions to read the database | Check file permissions and ensure GatherHub has read access |
Table not found | Incorrect table name | Verify the table name in the database |
Column not found | Incorrect column names | Verify the column names in the database |
No new URLs found | All URLs already imported | Add new bookmarks to the source and scan again |
Source scanning issues are logged to:
./data/logs/app.log
Look for entries containing "scan" or "source" to diagnose issues:
grep -i "scan\|source" data/logs/app.log