File size: 4,578 Bytes
bdb955e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
# Foot Calib Pos Image Processor
This project uses the TvCalib library to calculate the homography matrix for a football pitch image. This matrix allows mapping image points onto a standard 2D representation of the pitch (minimap).
The project also includes a pose estimation step (ViTPose) to detect players and calculate the average color of their torso.
**The main result is a minimap where each player is represented by their colored skeleton, drawn at a *dynamically* reduced scale around their projected position on the pitch.**
The position is determined by projecting a reference point (feet/bottom of bbox) using the homography. The skeleton is then drawn using its relative coordinates (original image), scaled, and translated. The scale used depends on the **player's vertical position (Y) on the minimap** (higher on the minimap = smaller) and a base factor adjustable via the `--target_avg_scale` option.
It is also possible to visualize a minimap with the projection of the original image for comparison.
## Features
* Homography calculation from a single image via TvCalib.
* Person detection (RT-DETR) and pose estimation (ViTPose).
* Calculation of the filtered average torso color for each player.
* Projection of each player's reference point (feet/bbox) onto the minimap.
* Generation of a minimap with the **original skeletons (colored, dynamically scaled based on projected Y position, and offset) drawn around the projected point**.
* (Optional) Generation of a minimap with the projected original image.
* Possibility to save the calculated homography matrix.
## Project Structure
```
.
├── .git/ # Git metadata
├── .venv/ # Python virtual environment (recommended)
├── common/ # Common Python modules (potentially)
├── data/ # Data (input images, etc.)
├── models/
│ └── segmentation/
│ └── train_59.pt # Pre-trained segmentation model (TO DOWNLOAD)
├── tvcalib/ # Source code of the TvCalib library (or a fork/adaptation)
│ └── infer/
│ └── module.py # Main module for TvCalib inference
├── .gitignore # Files ignored by Git
├── main.py # Main script entry point
├── requirements.txt # Python dependencies file
├── visualizer.py # Module for generating visualization minimaps
├── pose_estimator.py # Module for pose estimation and player data extraction
└── README.md # This file
```
## Installation
1. **Clone the repository:**
```powershell
git clone <repository-url>
cd Foot_calib_pos_image_processor
```
2. **Create a virtual environment (recommended):**
```powershell
python -m venv venv
.\venv\Scripts\Activate.ps1
```
3. **Install dependencies:**
```powershell
pip install -r requirements.txt
```
*(Make sure to install PyTorch with appropriate CUDA support if needed.)*
4. **Download the segmentation model:**
Place `train_59.pt` in `models/segmentation/`.
5. **(Automatic) Download detection/pose models:**
The RT-DETR and ViTPose models will be downloaded automatically.
## Usage
Run the `main.py` script providing the path to the image:
```powershell
python main.py path/to/your/image.jpg [OPTIONS]
```
**Options:**
* `image_path`: Path to the input image (required).
* `--output_homography PATH.npy`: Saves the calculated homography matrix.
* `--optim_steps NUMBER`: Number of optimization steps for calibration (default: 500, was 1000 in original README example).
* `--target_avg_scale FLOAT`: **Target average** scale factor for drawing skeletons (default: 0.35). The script attempts to adjust the internal base scale so that the resulting average scale (after inverse dynamic modulation) is close to this value.
**Example:**
```powershell
# Simple usage (target average size 0.35)
python main.py data/img3.png
# Aim for larger skeletons on average (target 0.5)
python main.py data/img2.png --target_avg_scale 0.5
```
The script will display:
* Time taken and homography matrix.
* Estimated internal base scale.
* Requested TARGET average scale.
* ACTUALLY applied FINAL average scale.
* Window: **Minimap with Original Projection**.
* Window: **Minimap with Offset Skeletons** (dynamically scaled inversely, targeting the average scale).
* Press any key to close.
## Key Dependencies
* PyTorch, OpenCV, NumPy, PyTorch Lightning
* SoccerNet, Kornia, Hugging Face Transformers, Pillow |