Scheduling

GatherHub's scheduling system allows you to automate scanning and downloading operations, ensuring your content is archived regularly without manual intervention.

Scheduling Overview

When enabled, the scheduling system will automatically:

Scan configured sources for new content
Process pending downloads
Retry failed downloads (if configured)
Clean up old and failed jobs (if configured)

These operations run at the configured interval, allowing you to maintain a consistent archiving workflow.

Configuration

Scheduling is configured in the config.toml file and can be adjusted through the Settings page:

[scheduling]
enabled = true
interval_minutes = 60  # How often to run the scheduled operations

You can also configure auto-cleaning operations:

[auto_clean]
enabled = true
retry_failed = true    # Whether to retry failed jobs during cleaning
max_retries = 3        # Maximum number of retry attempts
clean_after_days = 30  # Remove jobs older than this many days

Running in Daemon Mode

To use the scheduling system, you need to run GatherHub in daemon mode:

./gatherhub --daemon

You can combine this with other modes:

./gatherhub --daemon --web --api

This starts GatherHub in the background, where it will run scheduled operations according to your configuration.

Note: For the daemon to run permanently, you may want to set up GatherHub as a system service. Refer to the System Service section of the installation documentation.

Scheduling Options

Interval Setting

The interval_minutes setting controls how frequently GatherHub runs scheduled operations. Consider the following when choosing an interval:

Interval	Best For	Considerations
15-30 minutes	Frequently updated sources	Higher system load, more immediate archiving
60 minutes (default)	Most general use cases	Good balance between timeliness and system load
6-12 hours	Infrequently updated sources	Lower system load, less frequent archiving
24 hours	Daily archiving	Lowest system load, once-daily archiving

Auto-Clean Settings

The auto-clean feature helps maintain your database by managing old and failed jobs:

enabled: Turn auto-clean on or off
retry_failed: Attempt to download failed jobs again before cleaning
max_retries: How many times to attempt downloading a failed job before giving up
clean_after_days: Remove completed jobs older than this many days

Tip: To clean only failed jobs while preserving all completed jobs, set enabled = true and clean_after_days = 0. A value of 0 for clean_after_days will disable the cleanup of completed jobs while still allowing failed jobs to be cleaned up.

Warning: Setting clean_after_days too low might result in removing jobs you still want. Only downloaded files that are successfully archived will remain; job records in the database will be removed.

Monitoring Scheduled Operations

You can monitor scheduled operations in several ways:

Log Files

Scheduled operations are logged to the application log file:

./data/logs/app.log

Look for entries like:

2025/04/22 15:30:00 main.go:500: INFO: Starting scheduled operations
2025/04/22 15:30:05 main.go:515: INFO: Scanned sources, found 12 new jobs
2025/04/22 15:30:42 main.go:530: INFO: Processed 8 downloads (5 success, 3 failed)
2025/04/22 15:30:45 main.go:545: INFO: Auto-clean removed 15 old jobs
2025/04/22 15:30:45 main.go:550: INFO: Scheduled operations completed

Web Interface

On the Dashboard, you'll see:

The last time scheduled operations ran
The next scheduled run time
Statistics on recent operations

API

You can query the system status through the API:

GET /api/stats

Manual Override

Even with scheduling enabled, you can still manually trigger operations from the Dashboard:

Click "Scan Sources" to scan for new content immediately
Click "Process Downloads" to start downloading pending jobs
Click "Retry Failed" to attempt failed downloads again
Click "Clean Up" to remove old jobs based on your configuration

These manual actions do not affect the scheduling system; the next scheduled run will still occur at the configured interval.