Diffusers
Safetensors
English
vae
convolutional
generative
Vittorio Pippi commited on
Commit
9b27178
·
1 Parent(s): e99f49b

Include the YAML metadata

Browse files
Files changed (1) hide show
  1. README.md +50 -38
README.md CHANGED
@@ -1,6 +1,28 @@
1
- ## Emuru Convolutional VAE
2
 
3
- This repository hosts the Emuru Convolutional VAE, which is described in our paper. The model features a convolutional Encoder and Decoder with four layers each. The output channels for these layers are 32, 64, 128, and 256, respectively. The Encoder downsamples an input RGB image \(I \in \mathbb{R}^{3 \times W \times H}\) to a latent representation with a single channel and spatial dimensions \(h \times w\) (where \(h = H/8\) and \(w = W/8\)). This design compresses the style information in the image, enabling a lightweight Transformer Decoder to efficiently handle the latent features.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  ### Training Details
6
 
@@ -18,7 +40,7 @@ This repository hosts the Emuru Convolutional VAE, which is described in our pap
18
  - **Writer Identification:** A ResNet with 6 blocks, trained until achieving 60% accuracy on a synthetic dataset.
19
  - **Handwritten Text Recognition (HTR):** A Transformer Encoder-Decoder trained until reaching a Character Error Rate (CER) of 0.25 on the synthetic dataset.
20
 
21
- ### Usage
22
 
23
  You can load the pre-trained Emuru VAE using Diffusers’ `AutoencoderKL` interface with a single line of code:
24
 
@@ -27,9 +49,7 @@ from diffusers import AutoencoderKL
27
  model = AutoencoderKL.from_pretrained("vpippi/emuru_vae")
28
  ```
29
 
30
- The code snippet below demonstrates how to load an RGB image from disk, encode it into the latent space, decode it back to image space, and save the reconstructed image.
31
-
32
- ---
33
 
34
  ### Code Example
35
 
@@ -38,27 +58,29 @@ from diffusers import AutoencoderKL
38
  import torch
39
  from torchvision.transforms.functional import to_tensor, to_pil_image
40
  from PIL import Image
 
 
41
 
42
  # Load the pre-trained Emuru VAE from Hugging Face Hub.
43
  model = AutoencoderKL.from_pretrained("vpippi/emuru_vae")
44
 
45
- # Function to preprocess an RGB image:
46
- # Loads the image, converts it to RGB, and transforms it to a tensor normalized to [0, 1].
47
- def preprocess_image(image_path):
48
- image = Image.open(image_path).convert("RGB")
49
- image_tensor = to_tensor(image).unsqueeze(0) # Add batch dimension
 
50
  return image_tensor
51
 
52
  # Function to postprocess a tensor back to a PIL image for visualization:
53
  # Clamps the tensor to [0, 1] and converts it to a PIL image.
54
  def postprocess_tensor(tensor):
55
- tensor = torch.clamp(tensor, 0, 1).squeeze(0) # Remove batch dimension
56
  return to_pil_image(tensor)
57
 
58
- # Example: Encode and decode an image.
59
- # Replace with your image path.
60
- image_path = "/path/to/image"
61
- input_image = preprocess_image(image_path)
62
 
63
  # Encode the image to the latent space.
64
  # The encode() method returns an object with a 'latent_dist' attribute.
@@ -71,8 +93,6 @@ with torch.no_grad():
71
  with torch.no_grad():
72
  reconstructed = model.decode(latents).sample
73
 
74
- # Load the original image for comparison.
75
- original_image = Image.open(image_path).convert("RGB")
76
  # Convert the reconstructed tensor back to a PIL image.
77
  reconstructed_image = postprocess_tensor(reconstructed)
78
 
@@ -80,26 +100,18 @@ reconstructed_image = postprocess_tensor(reconstructed)
80
  reconstructed_image.save("reconstructed_image.png")
81
  ```
82
 
83
- ---
84
 
85
- ### Testing with Sample Images
86
-
87
- If you wish to test the model on sample images hosted on the Hugging Face Hub, you can:
88
- - **Include sample images in your repository:** Place images in a folder (e.g., `samples/`) and reference them in your code.
89
- - **Use the `huggingface_hub` API:** Download images programmatically using the `hf_hub_download` function.
90
-
91
- For example, to download a sample image from your repository:
92
-
93
- ```python
94
- from huggingface_hub import hf_hub_download
95
- from PIL import Image
96
-
97
- # Replace 'vpippi/emuru_vae' and 'samples/lam_sample.jpg' with your details.
98
- image_path = hf_hub_download(repo_id="vpippi/emuru_vae", filename="samples/lam_sample.jpg")
99
- sample_image = Image.open(image_path).convert("RGB")
100
- sample_image.show()
101
- ```
102
 
103
- This approach allows you to easily test and demonstrate the capabilities of the Emuru VAE using images hosted on the Hugging Face Hub.
 
 
104
 
105
- Feel free to modify the preprocessing steps to suit your needs. Enjoy experimenting with the Emuru VAE!
 
 
 
 
 
1
+ # Emuru Convolutional VAE
2
 
3
+ ```yaml
4
+ ---
5
+ language:
6
+ - "en"
7
+ tags:
8
+ - vae
9
+ - convolutional
10
+ - diffusers
11
+ - generative
12
+ license: "mit"
13
+ datasets:
14
+ - font-square
15
+ metrics:
16
+ - MAE
17
+ - KL
18
+ - CER
19
+ library_name: diffusers
20
+ ---
21
+ ```
22
+
23
+ ## Model Description
24
+
25
+ This repository hosts the **Emuru Convolutional VAE**, described in our paper. The model features a convolutional encoder and decoder, each with four layers. The output channels for these layers are 32, 64, 128, and 256, respectively. The encoder downsamples an input RGB image \( I \in \mathbb{R}^{3 \times W \times H} \) to a latent representation with a single channel and spatial dimensions \( h \times w \) (where \( h = H/8 \) and \( w = W/8 \)). This design compresses the style information in the image, allowing a lightweight Transformer Decoder to efficiently process the latent features.
26
 
27
  ### Training Details
28
 
 
40
  - **Writer Identification:** A ResNet with 6 blocks, trained until achieving 60% accuracy on a synthetic dataset.
41
  - **Handwritten Text Recognition (HTR):** A Transformer Encoder-Decoder trained until reaching a Character Error Rate (CER) of 0.25 on the synthetic dataset.
42
 
43
+ ## Usage
44
 
45
  You can load the pre-trained Emuru VAE using Diffusers’ `AutoencoderKL` interface with a single line of code:
46
 
 
49
  model = AutoencoderKL.from_pretrained("vpippi/emuru_vae")
50
  ```
51
 
52
+ Below is an example code snippet that demonstrates how to load an image directly from a URL, process it, encode it into the latent space, decode it back to image space, and save the reconstructed image.
 
 
53
 
54
  ### Code Example
55
 
 
58
  import torch
59
  from torchvision.transforms.functional import to_tensor, to_pil_image
60
  from PIL import Image
61
+ import requests
62
+ from io import BytesIO
63
 
64
  # Load the pre-trained Emuru VAE from Hugging Face Hub.
65
  model = AutoencoderKL.from_pretrained("vpippi/emuru_vae")
66
 
67
+ # Function to load and preprocess an RGB image from a URL:
68
+ # Fetches the image via requests, converts it to RGB, and transforms it to a tensor normalized to [0, 1].
69
+ def preprocess_image_from_url(url):
70
+ response = requests.get(url)
71
+ image = Image.open(BytesIO(response.content)).convert("RGB")
72
+ image_tensor = to_tensor(image).unsqueeze(0) # Add batch dimension.
73
  return image_tensor
74
 
75
  # Function to postprocess a tensor back to a PIL image for visualization:
76
  # Clamps the tensor to [0, 1] and converts it to a PIL image.
77
  def postprocess_tensor(tensor):
78
+ tensor = torch.clamp(tensor, 0, 1).squeeze(0) # Remove batch dimension.
79
  return to_pil_image(tensor)
80
 
81
+ # Example URL of the image.
82
+ image_url = "https://aimagelab.ing.unimore.it/imagelab/uploadedImages/000883.jpg"
83
+ input_image = preprocess_image_from_url(image_url)
 
84
 
85
  # Encode the image to the latent space.
86
  # The encode() method returns an object with a 'latent_dist' attribute.
 
93
  with torch.no_grad():
94
  reconstructed = model.decode(latents).sample
95
 
 
 
96
  # Convert the reconstructed tensor back to a PIL image.
97
  reconstructed_image = postprocess_tensor(reconstructed)
98
 
 
100
  reconstructed_image.save("reconstructed_image.png")
101
  ```
102
 
103
+ ## Additional Information
104
 
105
+ If you'd like to test with images hosted directly on the Hugging Face Hub, consider:
106
+ - **Including sample images in your repository:** Place them in a folder (e.g., `samples/`) and reference them directly.
107
+ - **Using the `huggingface_hub` API:** For example:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
 
109
+ ```python
110
+ from huggingface_hub import hf_hub_download
111
+ from PIL import Image
112
 
113
+ # Replace 'vpippi/emuru_vae' and 'samples/example_image.jpg' with your details.
114
+ image_path = hf_hub_download(repo_id="vpippi/emuru_vae", filename="samples/example_image.jpg")
115
+ sample_image = Image.open(image_path).convert("RGB")
116
+ sample_image.show()
117
+ ```