Git Repository Handling

GatherHub has special support for downloading and archiving Git repositories. This page details how Git repositories are handled, security considerations, and troubleshooting steps.

Repository Detection

The following URL patterns are automatically detected as Git repositories:

Pattern Example
GitHub repositories https://github.com/username/repository
GitLab repositories https://gitlab.com/username/repository
Bitbucket repositories https://bitbucket.org/username/repository
Any URL ending with .git https://example.org/repo.git
Note: Repository URLs are automatically detected even when they don't explicitly end with .git. For example, https://github.com/username/repository will be recognized as a Git repository.

Credential-Free Operations

For security reasons, GatherHub is designed to operate without using Git credentials. This means:

  • Only public repositories can be cloned
  • Private repositories requiring authentication will fail to download
  • SSH keys are not used for Git operations

The following security measures are implemented for Git operations:

Security Measures

  1. Environment Variables
    • GIT_CONFIG_NOGLOBAL=1: Prevents git from using global configuration
    • GIT_TERMINAL_PROMPT=0: Prevents git from prompting for credentials
    • GIT_ASKPASS=/bin/echo: Makes git fail instead of prompting for passwords
  2. Command Arguments
    • -c credential.helper=: Temporarily disables any configured credential helpers
    • --depth=1: Creates a shallow clone to minimize data transfer
    • --shallow-submodules: Also creates shallow clones of submodules
    • --single-branch: Clones only the default branch
    • --filter=blob:none: Optimizes the clone for faster operations

Repository Storage

Git repositories are stored in the directory specified by the storage.by_type.git configuration setting, which defaults to ./downloads/git/.

Directory Structure

When a repository is cloned, it's saved in a directory named according to its source:

[storage.base_path]/[storage.by_type.git]/[username]_[repository]

For example, a GitHub repository at https://github.com/user/project would be saved as:

./downloads/git/user_project/

ZIP Archive Creation

After a Git repository is successfully cloned, GatherHub automatically creates a ZIP archive for easy download through the web interface. This is handled by the gatherhub_zip_git_repos.sh hook script.

The ZIP Process

  1. After a successful Git clone, the post-download hook is triggered
  2. The script creates a ZIP archive of the repository, excluding hidden files like .git
  3. The ZIP file is stored alongside the repository with the same name plus .zip extension
  4. The web interface provides a download link to this ZIP file
Important: The ZIP archive creation requires that the zip command is installed on your system and the hook script is properly configured.

Hook Implementation

The ZIP creation is implemented as a post-download hook. You can find this script at:

data/hooks/gatherhub_zip_git_repos.sh

This script is automatically registered to run after Git repository downloads. It will:

  1. Check if the downloaded content is a Git repository (media_type == "git")
  2. Verify the repository directory exists
  3. Create a ZIP archive with the same name as the repository directory
  4. Exclude hidden files like .git from the archive

Troubleshooting

Stuck Downloads

If Git downloads get stuck in the "downloading" state, it's usually because the system is waiting for authentication which won't be provided due to the credential-free design.

To recover from stuck downloads:

  1. Use the unstuck jobs tool:

    ./data/tools/unstuck_jobs.sh

    This will reset any stuck jobs to "failed" status so they can be retried.

  2. You can also use the web interface to reset stuck jobs by clicking the "Reset Stuck Jobs" button.

  3. Or use the API endpoint:

    curl -X POST http://localhost:5000/api/reset-stuck

Common Issues

Problem Possible Cause Solution
Authentication failed Attempted to clone a private repository Only use public repositories with GatherHub
Permission denied Insufficient permissions to write to the downloads directory Ensure the user running GatherHub has write permissions to the downloads directory
No ZIP file created Missing zip command or hook script not executed Verify that zip is installed and check the hooks configuration

Checking Logs

To diagnose git-related issues, check the error log:

grep -i git data/logs/error.log

For authentication issues specifically:

grep -i credential data/logs/error.log

File Verification

Before marking a Git repository download as complete, GatherHub performs these verification steps:

  1. Verifies the repository directory exists
  2. Checks that the .git directory exists within the repository (confirming it's a valid git repo)
  3. For ZIP files, verifies the archive was successfully created

Manual Testing

To manually test Git repository handling:

  1. Add a public repository URL to the system (via web interface or API)
  2. Monitor the download process (check logs or web interface)
  3. Verify the repository was cloned correctly (check the repository directory)
  4. Verify the ZIP file was created (check for a .zip file alongside the repository)
Warning: For private repositories, the system will properly fail and log an error message rather than hang indefinitely.
Search Results

Type to search documentation...