GatherHub's scheduling system allows you to automate scanning and downloading operations, ensuring your content is archived regularly without manual intervention.
When enabled, the scheduling system will automatically:
These operations run at the configured interval, allowing you to maintain a consistent archiving workflow.
Scheduling is configured in the config.toml
file and can be adjusted through the Settings page:
[scheduling]
enabled = true
interval_minutes = 60 # How often to run the scheduled operations
You can also configure auto-cleaning operations:
[auto_clean]
enabled = true
retry_failed = true # Whether to retry failed jobs during cleaning
max_retries = 3 # Maximum number of retry attempts
clean_after_days = 30 # Remove jobs older than this many days
To use the scheduling system, you need to run GatherHub in daemon mode:
./gatherhub --daemon
You can combine this with other modes:
./gatherhub --daemon --web --api
This starts GatherHub in the background, where it will run scheduled operations according to your configuration.
The interval_minutes
setting controls how frequently GatherHub runs scheduled operations.
Consider the following when choosing an interval:
Interval | Best For | Considerations |
---|---|---|
15-30 minutes | Frequently updated sources | Higher system load, more immediate archiving |
60 minutes (default) | Most general use cases | Good balance between timeliness and system load |
6-12 hours | Infrequently updated sources | Lower system load, less frequent archiving |
24 hours | Daily archiving | Lowest system load, once-daily archiving |
The auto-clean feature helps maintain your database by managing old and failed jobs:
enabled = true
and clean_after_days = 0
. A value of 0 for clean_after_days
will disable the cleanup of completed jobs while still allowing failed jobs to be cleaned up.
clean_after_days
too low might result in removing jobs you still want.
Only downloaded files that are successfully archived will remain; job records in the database will be removed.
You can monitor scheduled operations in several ways:
Scheduled operations are logged to the application log file:
./data/logs/app.log
Look for entries like:
2025/04/22 15:30:00 main.go:500: INFO: Starting scheduled operations
2025/04/22 15:30:05 main.go:515: INFO: Scanned sources, found 12 new jobs
2025/04/22 15:30:42 main.go:530: INFO: Processed 8 downloads (5 success, 3 failed)
2025/04/22 15:30:45 main.go:545: INFO: Auto-clean removed 15 old jobs
2025/04/22 15:30:45 main.go:550: INFO: Scheduled operations completed
On the Dashboard, you'll see:
You can query the system status through the API:
GET /api/stats
Even with scheduling enabled, you can still manually trigger operations from the Dashboard:
These manual actions do not affect the scheduling system; the next scheduled run will still occur at the configured interval.