Spaces:

Riksarkivet
/

htr_demo

Running on Zero

App Files Files Community

amlpai04 commited on Jan 9

Commit

15a60a2

1 Parent(s): 5eb89c4

Norg support?

Browse files

Files changed (12) hide show

app/content/NOR/htrflow/htrflow_col1.md +18 -0
app/content/NOR/htrflow/htrflow_col2.md +23 -0
app/content/NOR/htrflow/htrflow_row1.md +3 -0
app/content/NOR/htrflow/htrflow_tab1.md +7 -0
app/content/NOR/htrflow/htrflow_tab2.md +7 -0
app/content/NOR/htrflow/htrflow_tab3.md +7 -0
app/content/NOR/htrflow/htrflow_tab4.md +7 -0
app/main.py +14 -4
app/tabs/adv_htrflow_tab.py +2 -1
app/tabs/htrflow_tab.py +1 -1
pyproject.toml +2 -2
uv.lock +0 -0

app/content/NOR/htrflow/htrflow_col1.md ADDED Viewed

	@@ -0,0 +1,18 @@

+### Introduktion
+Riksarkivet presenterar en demonstrationspipeline för HTR (Handwritten Text Recognition). Pipelinen består av två instanssegmenteringsmodeller: en tränad för att segmentera textregioner i bilder av löpande-textdokument och en annan tränad för att segmentera textrader inom dessa regioner. Textraderna transkriberas därefter av en textigenkänningsmodell som är tränad på ett stort dataset med svensk handskrift från 1600- till 1800-talet.
+### Användning
+Det är viktigt att betona att denna applikation främst är avsedd för demonstrationsändamål. Målet är att visa upp vår pipeline för att transkribera historiska dokument med löpande text, inte att använda pipelinen i storskalig produktion.
+**Obs**: I framtiden kommer vi att optimera koden för att passa ett produktionsscenario med multi-GPU och batch-inferens, men detta arbete pågår fortfarande. <br>
+För en inblick i de kommande funktionerna vi arbetar med:
+- Navigera till > **Översikt** > **Ändringslogg och roadmap**.
+### Begränsningar
+Demon, som är värd på Huggingface och tilldelad en T4 GPU, kan bara hantera två användarinlämningar åt gången. Om du upplever långa väntetider eller att applikationen inte svarar, är detta anledningen. I framtiden planerar vi att själva vara värdar för denna lösning, med en bättre server för en förbättrad användarupplevelse, optimerad kod och flera modellalternativ. Spännande utveckling är på gång!
+Det är också viktigt att notera att modellerna fungerar på löpande text och inte text i tabellformat.

app/content/NOR/htrflow/htrflow_col2.md ADDED Viewed

	@@ -0,0 +1,23 @@

+## Source Code
+Please fork and leave a star on Github if you like it! The code for this project can be found here:
+- [Github](https://github.com/Riksarkivet/HTRFLOW)
+**Note**: We will in the future package all of the code for mass HTR (batch inference on multi-GPU setup), but the code is still work in progress.
+## Models
+The models used in this demo are very much a work in progress, and as more data, and new architectures, becomes available, they will be retrained and reevaluated. For more information about the models, please refer to their model-cards on Huggingface.
+- [Riksarkivet/rtmdet_regions](https://huggingface.co/Riksarkivet/rtmdet_regions)
+- [Riksarkivet/rtmdet_lines](https://huggingface.co/Riksarkivet/rtmdet_lines)
+- [Riksarkivet/satrn_htr](https://huggingface.co/https://huggingface.co/Riksarkivet/satrn_htr)
+## Datasets
+Train and testsets created by the Swedish National Archives will be released here:
+- [Riksarkivet/placeholder_region_segmentation](https://huggingface.co/datasets/Riksarkivet/placeholder_region_segmentation)
+- [Riksarkivet/placeholder_line_segmentation](https://huggingface.co/datasets/Riksarkivet/placeholder_line_segmentation)
+- [Riksarkivet/placeholder_htr](https://huggingface.co/datasets/Riksarkivet/placeholder_htr)

app/content/NOR/htrflow/htrflow_row1.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ ## The Pipeline in Overview
2	+
3	+ The steps in the pipeline can be seen below as follows:

app/content/NOR/htrflow/htrflow_tab1.md ADDED Viewed

	@@ -0,0 +1,7 @@

+### Binarization
+The reason for binarizing the images before processing them is that we want the models to generalize as well as possible. By training on only binarized images and by binarizing images before running them through the pipeline, we take the target domain closer to the training domain, and reduce negative effects of background variation, background noise etc., on the final results. The pipeline implements a simple adaptive thresholding algorithm for binarization.
+<figure>
+<img src="https://github.com/Borg93/htr_gradio_file_placeholder/blob/main/app_project_bin.png?raw=true" alt="HTR_tool" style="width:70%; display: block; margin-left: auto; margin-right:auto;" >
+</figure>

app/content/NOR/htrflow/htrflow_tab2.md ADDED Viewed

	@@ -0,0 +1,7 @@

+### Text-region segmentation
+To facilitate the text-line segmentation process, it is advantageous to segment the image into text-regions beforehand. This initial step offers several benefits, including reducing variations in line spacing, eliminating blank areas on the page, establishing a clear reading order, and distinguishing marginalia from the main text. The segmentation model utilized in this process predicts both bounding boxes and masks. Although the model has the capability to predict both, only the masks are utilized for the segmentation tasks of lines and regions. An essential post-processing step involves checking for regions that are contained within other regions. During this step, only the containing region is retained, while the contained region is discarded. This ensures that the final segmented text-regions are accurate and devoid of overlapping or redundant areas. This ensures that there’s no duplicate text-regions sent to the text-recognition model.
+<figure>
+<img src="https://github.com/Borg93/htr_gradio_file_placeholder/blob/main/app_project_region.png?raw=true" alt="HTR_tool" style="width:70%; display: block; margin-left: auto; margin-right:auto;" >
+</figure>

app/content/NOR/htrflow/htrflow_tab3.md ADDED Viewed

	@@ -0,0 +1,7 @@

+### Text-line segmentation
+This is also an instance segmentation model, trained on extracting text-lines from the cropped text-regions. The same post-processing as in the text-region segmentation step, is done in the text-line segmentation step.
+<figure>
+<img src="https://github.com/Borg93/htr_gradio_file_placeholder/blob/main/app_project_line.png?raw=true" alt="HTR_tool" style="width:70%; display: block; margin-left: auto; margin-right:auto;" >
+</figure>

app/content/NOR/htrflow/htrflow_tab4.md ADDED Viewed

	@@ -0,0 +1,7 @@

+### Text Recognition
+The text-recognition model was trained on approximately one million handwritten text-line images ranging from the 17th to the 19th century. See the model card for detailed evaluation results, and results from some fine-tuning experiments.
+<figure>
+<img src="https://github.com/Borg93/htr_gradio_file_placeholder/blob/main/app_project_htr.png?raw=true" alt="HTR_tool" style="width:70%; display: block; margin-left: auto; margin-right:auto;" >
+</figure>

app/main.py CHANGED Viewed

@@ -23,12 +23,19 @@ LANG_CHOICES = ["ENG", "SWE"]
 with gr.Blocks(title="HTRflow", theme=theme, css=css) as demo:
     with gr.Row():
-        local_language = gr.BrowserState(default_value="ENG", storage_key="selected_language")
         main_language = gr.State(value="ENG")
         with gr.Column(scale=1):
             language_selector = gr.Dropdown(
-                choices=LANG_CHOICES, value="ENG", container=False, min_width=50, scale=0, elem_id="langdropdown"
             )
         with gr.Column(scale=2):
@@ -52,7 +59,10 @@ with gr.Blocks(title="HTRflow", theme=theme, css=css) as demo:
         with gr.Tab(label="Data Explorer") as tab_data_explorer:
             data_explorer.render()
-    @demo.load(inputs=[local_language], outputs=[language_selector, main_language, overview_language])
     def load_language(saved_values):
         return (saved_values,) * 3
@@ -86,4 +96,4 @@ with gr.Blocks(title="HTRflow", theme=theme, css=css) as demo:
 demo.queue()
 if __name__ == "__main__":
-    demo.launch(server_name="0.0.0.0", server_port=7860, enable_monitoring=False)

 with gr.Blocks(title="HTRflow", theme=theme, css=css) as demo:
     with gr.Row():
+        local_language = gr.BrowserState(
+            default_value="ENG", storage_key="selected_language"
+        )
         main_language = gr.State(value="ENG")
         with gr.Column(scale=1):
             language_selector = gr.Dropdown(
+                choices=LANG_CHOICES,
+                value="ENG",
+                container=False,
+                min_width=50,
+                scale=0,
+                elem_id="langdropdown",
             )
         with gr.Column(scale=2):
         with gr.Tab(label="Data Explorer") as tab_data_explorer:
             data_explorer.render()
+    @demo.load(
+        inputs=[local_language],
+        outputs=[language_selector, main_language, overview_language],
+    )
     def load_language(saved_values):
         return (saved_values,) * 3
 demo.queue()
 if __name__ == "__main__":
+    demo.launch(server_name="0.0.0.0", server_port=7861, enable_monitoring=False)

app/tabs/adv_htrflow_tab.py CHANGED Viewed

@@ -9,6 +9,7 @@ with gr.Blocks() as adv_htrflow_pipeline:
             # TODO: For the viewer we should be able to select from the output of the model what for values we want to
             # TODO: add batch predictions here..
             # TODO: add load from s3, local, hf daasets( however everything could go through hf_datasets).
             image_mask = gr.ImageMask(interactive=True)
@@ -22,7 +23,7 @@ with gr.Blocks() as adv_htrflow_pipeline:
                             interactive=True,
                         )
                     test = gr.Dropdown(  # TODO: This should be a dropdown to decide input image or mask or s3 or local path
-                        ["Upload", "Draw", "s3", "local"],
                         value="Upload",
                         multiselect=False,
                         label="Upload method",

             # TODO: For the viewer we should be able to select from the output of the model what for values we want to
             # TODO: add batch predictions here..
             # TODO: add load from s3, local, hf daasets( however everything could go through hf_datasets).
+            # TODO: send a crop from the user
             image_mask = gr.ImageMask(interactive=True)
                             interactive=True,
                         )
                     test = gr.Dropdown(  # TODO: This should be a dropdown to decide input image or mask or s3 or local path
+                        ["Upload", "Draw", "s3", "local", "crop"],
                         value="Upload",
                         multiselect=False,
                         label="Upload method",

app/tabs/htrflow_tab.py CHANGED Viewed

@@ -141,7 +141,7 @@ with gr.Blocks() as htrflow_pipeline:
                 with gr.Accordion(label="Pipeline", open=False):
                     with gr.Row() as simple_pipeline:
                         with gr.Column():
-                            with gr.Row(): # TODO: use dynamic rendering instead to make it more clean :https://www.youtube.com/watch?v=WhAMvOEOWJw&ab_channel=HuggingFace
                                 simple_segment_model = gr.Textbox(
                                     "model1", label="Segmentation", info="Info about the Segmentation model"
                                 )

                 with gr.Accordion(label="Pipeline", open=False):
                     with gr.Row() as simple_pipeline:
                         with gr.Column():
+                            with gr.Row():  # TODO: use dynamic rendering instead to make it more clean :https://www.youtube.com/watch?v=WhAMvOEOWJw&ab_channel=HuggingFace
                                 simple_segment_model = gr.Textbox(
                                     "model1", label="Segmentation", info="Info about the Segmentation model"
                                 )

pyproject.toml CHANGED Viewed

@@ -17,9 +17,9 @@ classifiers = [
 requires-python = ">=3.10,<3.13"
 dependencies = [
-    "torch==2.0.1",
     "htrflow==0.1.3",
-    "gradio>=5.9.1",
     "datasets>=3.2.0",
     "pandas>=2.2.3",
     "jinja2>=3.1.4",

 requires-python = ">=3.10,<3.13"
 dependencies = [
+    # "torch==2.0.1",
     "htrflow==0.1.3",
+    "gradio>=5.11.0",
     "datasets>=3.2.0",
     "pandas>=2.2.3",
     "jinja2>=3.1.4",

uv.lock CHANGED Viewed

The diff for this file is too large to render. See raw diff