File size: 4,578 Bytes
bdb955e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# Foot Calib Pos Image Processor

This project uses the TvCalib library to calculate the homography matrix for a football pitch image. This matrix allows mapping image points onto a standard 2D representation of the pitch (minimap).

The project also includes a pose estimation step (ViTPose) to detect players and calculate the average color of their torso.

**The main result is a minimap where each player is represented by their colored skeleton, drawn at a *dynamically* reduced scale around their projected position on the pitch.**
The position is determined by projecting a reference point (feet/bottom of bbox) using the homography. The skeleton is then drawn using its relative coordinates (original image), scaled, and translated. The scale used depends on the **player's vertical position (Y) on the minimap** (higher on the minimap = smaller) and a base factor adjustable via the `--target_avg_scale` option.

It is also possible to visualize a minimap with the projection of the original image for comparison.

## Features

*   Homography calculation from a single image via TvCalib.
*   Person detection (RT-DETR) and pose estimation (ViTPose).
*   Calculation of the filtered average torso color for each player.
*   Projection of each player's reference point (feet/bbox) onto the minimap.
*   Generation of a minimap with the **original skeletons (colored, dynamically scaled based on projected Y position, and offset) drawn around the projected point**.
*   (Optional) Generation of a minimap with the projected original image.
*   Possibility to save the calculated homography matrix.

## Project Structure

```
.
├── .git/               # Git metadata
├── .venv/              # Python virtual environment (recommended)
├── common/             # Common Python modules (potentially)
├── data/               # Data (input images, etc.)
├── models/
│   └── segmentation/
│       └── train_59.pt # Pre-trained segmentation model (TO DOWNLOAD)
├── tvcalib/            # Source code of the TvCalib library (or a fork/adaptation)
│   └── infer/
│       └── module.py   # Main module for TvCalib inference
├── .gitignore          # Files ignored by Git
├── main.py             # Main script entry point
├── requirements.txt    # Python dependencies file
├── visualizer.py       # Module for generating visualization minimaps
├── pose_estimator.py   # Module for pose estimation and player data extraction
└── README.md           # This file
```

## Installation

1.  **Clone the repository:**
    ```powershell
    git clone <repository-url>
    cd Foot_calib_pos_image_processor
    ```

2.  **Create a virtual environment (recommended):**
    ```powershell
    python -m venv venv
    .\venv\Scripts\Activate.ps1
    ```

3.  **Install dependencies:**
    ```powershell
    pip install -r requirements.txt
    ```
    *(Make sure to install PyTorch with appropriate CUDA support if needed.)*

4.  **Download the segmentation model:**
    Place `train_59.pt` in `models/segmentation/`.

5.  **(Automatic) Download detection/pose models:**
    The RT-DETR and ViTPose models will be downloaded automatically.

## Usage

Run the `main.py` script providing the path to the image:

```powershell
python main.py path/to/your/image.jpg [OPTIONS]
```

**Options:**

*   `image_path`: Path to the input image (required).
*   `--output_homography PATH.npy`: Saves the calculated homography matrix.
*   `--optim_steps NUMBER`: Number of optimization steps for calibration (default: 500, was 1000 in original README example).
*   `--target_avg_scale FLOAT`: **Target average** scale factor for drawing skeletons (default: 0.35). The script attempts to adjust the internal base scale so that the resulting average scale (after inverse dynamic modulation) is close to this value.

**Example:**

```powershell
# Simple usage (target average size 0.35)
python main.py data/img3.png

# Aim for larger skeletons on average (target 0.5)
python main.py data/img2.png --target_avg_scale 0.5
```

The script will display:
*   Time taken and homography matrix.
*   Estimated internal base scale.
*   Requested TARGET average scale.
*   ACTUALLY applied FINAL average scale.
*   Window: **Minimap with Original Projection**.
*   Window: **Minimap with Offset Skeletons** (dynamically scaled inversely, targeting the average scale).
*   Press any key to close.

## Key Dependencies

*   PyTorch, OpenCV, NumPy, PyTorch Lightning
*   SoccerNet, Kornia, Hugging Face Transformers, Pillow