5 Commits
v0.2 ... main

Author SHA1 Message Date
ae347d7506 Add repo_owner for where the migrated repos should go to
All checks were successful
Cargo Build & Test / Rust project - latest (1.90) (push) Successful in 2m11s
Generated with Claude 4.5 Sonnet (Gemini 2.5 Pro was stubborn about
wanting to refactor the world even when it was unrelated). Prompt was;
```
Given the attached rust code, and example toml configuration file;
... SNIP ...

Can you add the ability to specify the owner of the migrated repo, such
that if not provided then to use the user who owns the API key, or if
provided then to use said owner? I think gitea refers to that as the
"repo_owner" in it's swagger based API docs? Note, this would be one
configuration for all the repos in the config toml file, not on a
per-repo basis.
```

which didn't work since it tried to fetch a uid which doesn't exist for
organizations, so lets prompt it to fix that and give it a helping hand;
```
Ah shoot, if I am using a migration to a new organization which the
user who owns the API key has permissions to modify, then I am getting
a 401 return code. Did you assume the target will always be a user
rather than also being an organization? Also, keep in mind, I think
giteas migration API wants a user string, rather than a user ID, if
that's the case then I think we can remove the entire
`get_user_id_by_username()` function?
```
2025-10-06 19:17:28 -04:00
fdb7cf7a4a Some human touches (make clearer this' vibe coded, better env vars, etc)
All checks were successful
Cargo Build & Test / Rust project - latest (1.90) (push) Successful in 1m51s
2025-09-22 20:40:17 -04:00
3497cbaa6e Upload example config toml 2025-09-22 20:33:36 -04:00
129d67bc8b Don't use the same API key for other organizations we are pulling from
```EOF
For the organizations list, I am trying use my test instance, but getting the following in the logs;
```
2025-09-23T00:12:38.638052Z  INFO gitea_mirror: Fetching repositories from organization: https://gitea.hak8or.com/mirrors
2025-09-23T00:12:38.638081Z  INFO fetch_org_repos{org_url="https://gitea.hak8or.com/mirrors"}: gitea_mirror: Querying API endpoint: https://gitea.hak8or.com/api/v1/users/mirrors/repos
2025-09-23T00:12:38.653694Z ERROR gitea_mirror: Failed to fetch repos from https://gitea.hak8or.com/mirrors: HTTP status client error (401 Unauthorized) for url (https://gitea.hak8or.com/api/v1/users/mirrors/repos?page=1)
2025-09-23T00:12:38.653713Z  INFO gitea_mirror: Gitea mirror process completed.
```

I don't have a user with that key for the instance. Can you add the ability to provide an api key to each organization entry in the toml config? At the same time, is it possible to get a list of all repos from an organization without needing to use an api key? If so, when no api key is provided, can you use that?
```EOF
2025-09-22 20:29:14 -04:00
f732535db2 Re-create this with the canvas option in Gemini 2.5 Pro web chat
```EOF
Create a very minimal and simple tool written in rust which takes in a list of git URLs, and using the gitea api checks if the remote is already mirrored, and if not, then create a repo migration to gitea. I want to basically create a script which can be used to ensure a list of git repos are mirrord to a gitea server.

 The script should take in some command line arguments for;
  - an option to do a dry run, meaning do the check if the repo has to be mirrord, but do not initiate the actual migration
 - path to a TOML configuration file (also can be supplied via an ENV variable)

 The configuration file would have the following information;
   - an API key to be used when talking to the gitea instance we are migrating to
  - the url of the above gitea instance
  - a list of git URLs including an optional rename of the repo name
  - a list of URLs of another git server (gitea, if the API is the same then github, gitlab, etc) that includes the organization name or username. You would clone all repos under that organization/username. For example "https://github.com/hak8or" would be all repos owned by hak8or.

Example toml file;
```
gitea_url = "https://gitmirror.hak8or.com"

api_key = "api_key_goes_here"

repos = [
	{ url = "https://gitea.hak8or.com/hak8or/gitea_mirror.git" },
	{ rename = "cool_rename", url = "https://gitea.hak8or.com/hak8or/gitea_mirror.git" },
	{ rename = "cool_another_rename", url = "https://gitea.hak8or.com/hak8or/gitea_mirror.git" },
	{ rename = "rusty_rust", url = "https://github.com/rust-lang/rust.git" },
]
```

Ensure the script is as minimal as possible, do not use libraries if you can avoid them (except clap for CLI arguments, tracing for logging, actix for async and web interactions, reqwest for actual queries, and serde_json for json, or whatever else is commonly used in rust). I will be invoking this tool with a systemd timer.
```EOF
2025-09-22 20:28:35 -04:00
4 changed files with 304 additions and 189 deletions

20
Cargo.lock generated
View File

@@ -67,12 +67,6 @@ dependencies = [
"windows-sys 0.60.2",
]
[[package]]
name = "anyhow"
version = "1.0.100"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a23eb6b1614318a8071c9b2521f36b424b2c83db5eb3a0fead4a6c0809af6e61"
[[package]]
name = "atomic-waker"
version = "1.1.2"
@@ -352,7 +346,6 @@ checksum = "07e28edb80900c19c28f1072f2e8aeca7fa06b23cd4169cefe1af5aa3260783f"
name = "gitea_mirror"
version = "0.1.0"
dependencies = [
"anyhow",
"clap",
"reqwest",
"serde",
@@ -361,7 +354,6 @@ dependencies = [
"toml",
"tracing",
"tracing-subscriber",
"url",
]
[[package]]
@@ -1091,9 +1083,9 @@ dependencies = [
[[package]]
name = "serde"
version = "1.0.225"
version = "1.0.226"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fd6c24dee235d0da097043389623fb913daddf92c76e9f5a1db88607a0bcbd1d"
checksum = "0dca6411025b24b60bfa7ec1fe1f8e710ac09782dca409ee8237ba74b51295fd"
dependencies = [
"serde_core",
"serde_derive",
@@ -1101,18 +1093,18 @@ dependencies = [
[[package]]
name = "serde_core"
version = "1.0.225"
version = "1.0.226"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "659356f9a0cb1e529b24c01e43ad2bdf520ec4ceaf83047b83ddcc2251f96383"
checksum = "ba2ba63999edb9dac981fb34b3e5c0d111a69b0924e253ed29d83f7c99e966a4"
dependencies = [
"serde_derive",
]
[[package]]
name = "serde_derive"
version = "1.0.225"
version = "1.0.226"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0ea936adf78b1f766949a4977b91d2f5595825bd6ec079aa9543ad2685fc4516"
checksum = "8db53ae22f34573731bafa1db20f04027b2d25e02d8205921b569171699cdb33"
dependencies = [
"proc-macro2",
"quote",

View File

@@ -4,13 +4,11 @@ version = "0.1.0"
edition = "2024"
[dependencies]
anyhow = "1.0.100"
clap = { version = "4.5", features = ["derive", "env"] }
reqwest = { version = "0.12.23", features = ["json"] }
clap = { version = "4.0", features = ["derive", "env"] }
tokio = { version = "1", features = ["full"] }
reqwest = { version = "0.12", features = ["json"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
tokio = { version = "1.35", features = ["full"] }
toml = "0.9.7"
toml = "0.9"
tracing = "0.1"
tracing-subscriber = "0.3"
url = "2.5.7"

21
example.toml Normal file
View File

@@ -0,0 +1,21 @@
# The base URL of your Gitea instance
gitea_url = "https://gitmirror.hak8or.com"
# Your Gitea API key (generate one from User Settings -> Applications)
api_key = "API_KEY_GOES_HERE"
# Optional: specify the owner username for all migrated repos
# If not specified, uses the user who owns the API key
repo_owner = "mirror_org"
# A list of remote git repositories to mirror.
repos = [
{ url = "https://gitea.hak8or.com/hak8or/gitea_mirror.git" },
{ rename = "cool_rename", url = "https://gitea.hak8or.com/hak8or/gitea_mirror.git" },
{ rename = "cool_another_rename", url = "https://gitea.hak8or.com/hak8or/gitea_mirror.git" },
{ url = "https://github.com/justcallmekoko/ESP32Marauder" }
]
organizations = [
{ url = "https://gitea.hak8or.com/mirrors" },
]

View File

@@ -1,229 +1,333 @@
use anyhow::{Context, Result};
use clap::Parser;
use reqwest::header::{ACCEPT, AUTHORIZATION, CONTENT_TYPE, USER_AGENT};
use serde::{Deserialize, Serialize};
use std::collections::HashSet;
use serde::Deserialize;
use std::fs;
use std::path::PathBuf;
use tracing::{error, info, warn};
use url::Url;
use std::path::{Path, PathBuf};
use tracing::{Level, error, info, instrument, warn};
use tracing_subscriber;
// --- Structs (Unchanged) ---
// Represents the command-line arguments.
#[derive(Parser, Debug)]
#[command(name = "gitea-mirror")]
#[command(
author,
version,
about = "A simple tool to ensure git repositories are mirrored to Gitea."
about = "Ensures Git repositories are mirrored to Gitea, generated with Gemini 2.5 Web Canvas"
)]
struct Cli {
#[arg(short, long, env = "GITEA_MIRROR_CONFIG")]
#[clap(author, version, about, long_about = None)]
struct Args {
/// Path to the TOML configuration file.
#[clap(short, long, value_parser, env = "GITEA_MIRROR_CONFIG_FILEPATH")]
config: PathBuf,
#[arg(long)]
/// Perform a dry run without creating any migrations.
#[clap(short, long, default_value_t = false)]
dry_run: bool,
}
#[derive(Deserialize, Debug)]
struct RepoToMirror {
// Represents a single repository entry in the config file.
#[derive(Deserialize, Debug, Clone)]
struct RepoConfig {
url: String,
rename: Option<String>,
}
// Represents a single organization entry in the config file.
#[derive(Deserialize, Debug, Clone)]
struct OrgConfig {
url: String,
api_key: Option<String>,
}
// Represents the main structure of the TOML configuration file.
#[derive(Deserialize, Debug)]
struct Config {
gitea_url: String,
api_key: String,
repos: Vec<RepoToMirror>,
repos: Option<Vec<RepoConfig>>,
organizations: Option<Vec<OrgConfig>>,
repo_owner: Option<String>, // Optional owner username/org for all migrated repos
}
// --- Gitea API Structs (Corrected) ---
// Represents the payload for creating a migration in Gitea.
#[derive(serde::Serialize, Debug)]
struct MigrateRepoPayload<'a> {
clone_addr: &'a str,
repo_name: &'a str,
repo_owner: &'a str, // Username or organization name
mirror: bool,
private: bool,
description: &'a str,
}
// Represents a user as returned by the Gitea API.
#[derive(Deserialize, Debug)]
struct GiteaUser {
id: i64,
login: String,
}
// **MODIFIED**: This struct now includes `name` and the correct `mirror_url` field.
#[derive(Deserialize, Debug)]
struct GiteaRepo {
name: String,
mirror: bool,
mirror_url: Option<String>, // The original source URL of the mirror
}
#[derive(Serialize, Debug)]
struct MigrationRequest<'a> {
clone_addr: &'a str,
uid: i64,
repo_name: &'a str,
mirror: bool,
private: bool,
description: String,
}
/// Entry point of the application.
#[tokio::main]
async fn main() -> Result<()> {
tracing_subscriber::fmt::init();
let cli = Cli::parse();
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Initialize the tracing subscriber for logging.
tracing_subscriber::fmt().with_max_level(Level::INFO).init();
let config_content = fs::read_to_string(&cli.config)
.with_context(|| format!("Failed to read config file at {:?}", cli.config))?;
let config: Config =
toml::from_str(&config_content).context("Failed to parse TOML configuration")?;
// Parse command-line arguments or get config path from environment variable.
let args = Args::parse();
if cli.dry_run {
info!("Performing a dry run. No migrations will be created.");
info!("Starting Gitea mirror process. Dry run: {}", args.dry_run);
// Read and parse the configuration file.
let config = load_config(&args.config)?;
let http_client = reqwest::Client::new();
// Determine the owner (either from repo_owner or authenticated user)
let owner_name = if let Some(owner) = &config.repo_owner {
info!("Using specified repo_owner: {}", owner);
owner.clone()
} else {
info!("No repo_owner specified, fetching authenticated user");
get_authenticated_username(&http_client, &config.gitea_url, &config.api_key).await?
};
info!("Using owner '{}' for all migrated repositories", owner_name);
// Process repositories from the static list.
if let Some(repos) = &config.repos {
for repo_config in repos {
process_repo(
&repo_config.url,
repo_config.rename.as_deref(),
&owner_name,
&http_client,
&config,
args.dry_run,
)
.await?;
}
}
let mut headers = reqwest::header::HeaderMap::new();
headers.insert(ACCEPT, "application/json".parse()?);
headers.insert(CONTENT_TYPE, "application/json".parse()?);
headers.insert(USER_AGENT, "gitea-mirror-tool/0.1.0".parse()?);
headers.insert(AUTHORIZATION, format!("token {}", config.api_key).parse()?);
let client = reqwest::Client::builder()
.default_headers(headers)
.build()?;
// Process repositories from the organizations/users list.
if let Some(org_configs) = &config.organizations {
for org_config in org_configs {
info!(
"Fetching repositories from organization: {}",
org_config.url
);
match fetch_org_repos(&http_client, &org_config.url, org_config.api_key.as_deref())
.await
{
Ok(repo_urls) => {
info!(
"Found {} repositories for {}",
repo_urls.len(),
org_config.url
);
for url in repo_urls {
process_repo(
&url,
None, // No rename support for orgs
&owner_name,
&http_client,
&config,
args.dry_run,
)
.await?;
}
}
Err(e) => error!("Failed to fetch repos from {}: {}", org_config.url, e),
}
}
}
info!("🔗 Connecting to Gitea instance at {}", config.gitea_url);
info!("Gitea mirror process completed.");
Ok(())
}
let user_url = format!("{}/api/v1/user", config.gitea_url);
let user = client
.get(&user_url)
/// Loads and parses the TOML configuration file.
#[instrument(skip(path))]
fn load_config(path: &Path) -> Result<Config, Box<dyn std::error::Error>> {
info!("Loading configuration from: {:?}", path);
let content = fs::read_to_string(path)?;
let config: Config = toml::from_str(&content)?;
Ok(config)
}
/// Fetches the authenticated user's login name from Gitea.
#[instrument(skip(http_client, gitea_url, api_key))]
async fn get_authenticated_username(
http_client: &reqwest::Client,
gitea_url: &str,
api_key: &str,
) -> Result<String, reqwest::Error> {
let url = format!("{}/api/v1/user", gitea_url);
let user: GiteaUser = http_client
.get(&url)
.header("Authorization", format!("token {}", api_key))
.send()
.await?
.error_for_status()?
.json::<GiteaUser>()
.await
.context("Failed to get Gitea user info. Check your API key and Gitea URL.")?;
info!("Authenticated as user '{}' (ID: {})", user.login, user.id);
.json()
.await?;
info!("Authenticated as user: {}", user.login);
Ok(user.login)
}
// **MODIFIED**: We now build two sets: one for source URLs and one for existing repo names.
info!("🔍 Fetching all existing repositories to build a local cache...");
let mut existing_mirror_sources: HashSet<String> = HashSet::new();
let mut existing_repo_names: HashSet<String> = HashSet::new();
/// Checks if a repository already exists in Gitea for the user.
#[instrument(skip(http_client, gitea_url, api_key))]
async fn repo_exists(
http_client: &reqwest::Client,
gitea_url: &str,
api_key: &str,
repo_name: &str,
) -> Result<bool, reqwest::Error> {
let url = format!("{}/api/v1/repos/search", gitea_url);
let response: serde_json::Value = http_client
.get(&url)
.query(&[("q", repo_name), ("limit", "1")])
.header("Authorization", format!("token {}", api_key))
.send()
.await?
.error_for_status()?
.json()
.await?;
if let Some(data) = response.get("data").and_then(|d| d.as_array()) {
for repo in data {
if let Some(name) = repo.get("name").and_then(|n| n.as_str()) {
if name.eq_ignore_ascii_case(repo_name) {
return Ok(true);
}
}
}
}
Ok(false)
}
/// Creates a mirror migration in Gitea.
#[instrument(skip(http_client, config, payload))]
async fn create_migration(
http_client: &reqwest::Client,
config: &Config,
payload: &MigrateRepoPayload<'_>,
) -> Result<(), reqwest::Error> {
let url = format!("{}/api/v1/repos/migrate", config.gitea_url);
http_client
.post(&url)
.header("Authorization", format!("token {}", config.api_key))
.json(payload)
.send()
.await?
.error_for_status()?;
Ok(())
}
/// Fetches all repository clone URLs from a given Gitea/GitHub organization/user page.
#[instrument(skip(http_client, api_key))]
async fn fetch_org_repos(
http_client: &reqwest::Client,
org_url: &str,
api_key: Option<&str>,
) -> Result<Vec<String>, Box<dyn std::error::Error>> {
// This is a simplified fetcher. It assumes Gitea API compatibility.
// For GitHub, you might need a different base URL and auth method.
let api_url = if org_url.contains("github.com") {
let parts: Vec<&str> = org_url.trim_end_matches('/').split('/').collect();
let user_or_org = parts.last().ok_or("Invalid GitHub URL")?;
format!("https://api.github.com/users/{}/repos", user_or_org)
} else {
// Assuming Gitea-like URL structure
let parts: Vec<&str> = org_url.trim_end_matches('/').split('/').collect();
let user_or_org = parts.last().ok_or("Invalid Gitea URL")?;
format!(
"{}s/{}/repos",
org_url.replace(user_or_org, &format!("api/v1/user")),
user_or_org
)
};
info!("Querying API endpoint: {}", api_url);
let mut repos: Vec<String> = Vec::new();
let mut page = 1;
loop {
let repos_url = format!("{}/api/v1/user/repos", config.gitea_url);
let repos_on_page = client
.get(&repos_url)
.query(&[("limit", "50"), ("page", &page.to_string())])
let mut request_builder = http_client
.get(&api_url)
.query(&[("page", page.to_string())])
// For GitHub, a User-Agent is required.
.header("User-Agent", "gitea-mirror-rust-client");
if let Some(key) = api_key {
request_builder = request_builder.header("Authorization", format!("token {}", key));
}
let response: Vec<serde_json::Value> = request_builder
.send()
.await?
.error_for_status()?
.json::<Vec<GiteaRepo>>()
.await
.context("Failed to fetch a page of existing repositories.")?;
.json()
.await?;
if repos_on_page.is_empty() {
break;
if response.is_empty() {
break; // No more pages
}
for repo in repos_on_page {
// Add the name of EVERY repo to prevent any name collisions.
existing_repo_names.insert(repo.name);
// If it's a mirror, store its ORIGINAL source URL for an exact match.
if repo.mirror {
if let Some(mirror_url) = repo.mirror_url {
existing_mirror_sources.insert(mirror_url);
}
for repo in response {
if let Some(clone_url) = repo.get("clone_url").and_then(|u| u.as_str()) {
repos.push(clone_url.to_string());
}
}
page += 1;
}
info!(
"Found {} existing repositories and {} configured mirrors.",
existing_repo_names.len(),
existing_mirror_sources.len()
);
Ok(repos)
}
// **MODIFIED**: The main checking logic is now much more robust.
for repo_config in &config.repos {
let url_to_mirror = &repo_config.url;
/// Core logic to process a single repository.
#[instrument(skip(owner_name, http_client, config, dry_run))]
async fn process_repo(
repo_url: &str,
rename: Option<&str>,
owner_name: &str,
http_client: &reqwest::Client,
config: &Config,
dry_run: bool,
) -> Result<(), Box<dyn std::error::Error>> {
let repo_name = match rename {
Some(name) => name,
None => extract_repo_name(repo_url).ok_or("Could not extract repo name from URL")?,
};
// CHECK 1: Has this exact source URL already been mirrored?
if existing_mirror_sources.contains(url_to_mirror) {
info!(
"Mirror for source URL '{}' already exists. Skipping.",
url_to_mirror
);
continue;
}
info!("Processing repo '{}' -> '{}'", repo_url, repo_name);
// Determine the target name for the new repository.
let target_repo_name = match &repo_config.rename {
Some(name) => name.clone(),
None => get_repo_name_from_url(url_to_mirror).with_context(|| {
format!("Could not parse repo name from URL: {}", url_to_mirror)
})?,
};
// CHECK 2: Will creating this mirror cause a name collision?
if existing_repo_names.contains(&target_repo_name) {
warn!(
"Cannot create mirror for '{}'. A repository named '{}' already exists. Skipping.",
url_to_mirror, target_repo_name
);
continue;
}
// If both checks pass, we are clear to create the migration.
info!(
"Mirror for '{}' not found and name '{}' is available. Needs creation.",
url_to_mirror, target_repo_name
);
if cli.dry_run {
warn!(
"--dry-run enabled, skipping migration for '{}'.",
url_to_mirror
);
continue;
}
let migration_payload = MigrationRequest {
clone_addr: url_to_mirror,
uid: user.id,
repo_name: &target_repo_name,
mirror: true,
private: false,
description: format!("Mirror of {}", url_to_mirror),
};
info!(
"🚀 Creating migration for '{}' as new repo '{}'...",
url_to_mirror, target_repo_name
);
let migrate_url = format!("{}/api/v1/repos/migrate", config.gitea_url);
let response = client
.post(&migrate_url)
.json(&migration_payload)
.send()
.await?;
if response.status().is_success() {
info!("Successfully initiated migration for '{}'.", url_to_mirror);
if repo_exists(http_client, &config.gitea_url, &config.api_key, repo_name).await? {
info!("Repo '{}' already exists. Skipping.", repo_name);
} else {
warn!("Repo '{}' does not exist. Migration needed.", repo_name);
if !dry_run {
info!("Initiating migration for '{}'...", repo_name);
let payload = MigrateRepoPayload {
clone_addr: repo_url,
repo_name,
repo_owner: owner_name,
mirror: true,
private: false, // Defaulting to public, change if needed
description: "",
};
if let Err(e) = create_migration(http_client, config, &payload).await {
error!("Failed to create migration for '{}': {}", repo_name, e);
} else {
info!("Successfully started migration for '{}'.", repo_name);
}
} else {
let status = response.status();
let error_body = response
.text()
.await
.unwrap_or_else(|_| "Could not read error body".to_string());
error!(
"Failed to create migration for '{}'. Status: {}. Body: {}",
url_to_mirror, status, error_body
info!(
"Dry run enabled. Skipping actual migration for '{}'.",
repo_name
);
}
}
info!("All tasks completed.");
Ok(())
}
fn get_repo_name_from_url(git_url: &str) -> Option<String> {
Url::parse(git_url)
.ok()
.and_then(|url| url.path_segments()?.last().map(|s| s.to_string()))
.map(|name| name.strip_suffix(".git").unwrap_or(&name).to_string())
/// Extracts a repository name from a git URL (e.g., "https://.../repo.git" -> "repo").
fn extract_repo_name(url: &str) -> Option<&str> {
url.split('/').last().map(|s| s.trim_end_matches(".git"))
}