Summary

Official Hugging Face Hub Python library documentation for uploading files to repositories. Covers upload_file(), upload_folder(), upload_large_folder(), CLI (hf upload), non-blocking uploads (run_as_future), scheduled uploads (CommitScheduler), and low-level create_commit() operations.

Hugging Face Hub Python 庫上傳文件到存儲庫的官方文檔。涵蓋 upload_file()upload_folder()upload_large_folder()、CLI(hf upload)、非阻塞上傳(run_as_future)、定時上傳(CommitScheduler)和底層 create_commit() 操作。

Key Points

  • upload_file(): upload a single file; specify path_or_fileobj, path_in_repo, repo_id, repo_type (model/dataset/space)
  • upload_folder(): upload entire folder; respects .gitignore; supports allow_patterns, ignore_patterns, delete_patterns; creates a single commit
  • upload_large_folder(): resumable, multi-threaded, resilient upload for large datasets; splits into many small tasks with local caching; multiple commits
  • CLI: hf upload [repo_id] [local_path] [path_in_repo]; hf upload-large-folder for large datasets
  • Non-blocking: run_as_future=True returns Future object; background uploads queued in order
  • CommitScheduler: push data to Hub periodically from local folder; designed for append-only streaming data (training logs, user feedback); uses scheduler.lock for thread-safety
  • create_commit(): low-level API; supports CommitOperationAdd, CommitOperationDelete, CommitOperationCopy
  • hf_xet: new chunk-based deduplication storage (enabled by default in huggingface_hub ≥ 0.32.0); faster uploads; set HF_XET_HIGH_PERFORMANCE=1 for maximum throughput

Insights

CommitScheduler is the pattern to know for ML training pipelines — instead of polluting the git history with thousands of checkpoint commits, schedule periodic batch commits. The preupload_lfs_files() + create_commit() pattern is important for large shard uploads: pre-uploading each shard to S3 before making the final commit avoids OOM issues from loading all shards simultaneously. The hf_xet storage system (Rust-based, chunk-deduplication) is a significant improvement for large model repositories where many versions share most weights.

Connections

Raw Excerpt

Sharing your files and work is an important aspect of the Hub. The huggingface_hub offers several options for uploading your files to the Hub. You can use these functions independently or integrate them into your library.