cmarkea
/

detr-layout-detection

Object Detection

image-segmentation

Model card Files Files and versions Community

detr-layout-detection / README.md

Cyrile's picture

Update README.md

f20b6ed verified 10 months ago

|

1.73 kB

	---
	library_name: transformers
	license: apache-2.0
	datasets:
	- ds4sd/DocLayNet
	pipeline_tag: image-segmentation
	---

	# DETR-layout-detection

	We present the model cmarkea/detr-layout-detection, which allows extracting different layouts (Text, Picture, Caption, Footnote, etc.) from an image of a document.
	This is a fine-tuning of the model [detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50) on the [DocLayNet](https://huggingface.co/datasets/ds4sd/DocLayNet)
	dataset. This model can jointly predict masks and bounding boxes for documentary objects. It is ideal for processing documentary corpora to be ingested into an
	ODQA system.

	This model allows extracting 11 entities, which are: Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, and Title.

	## Performance

	## Direct Use

	```python
	from transformers import AutoImageProcessor
	from transformers.models.detr import DetrForSegmentation

	img_proc = AutoImageProcessor.from_pretrained(
	"ArkeaIAF/detr-layout-detection"
	)
	model = DetrForSegmentation.from_pretrained(
	"ArkeaIAF/detr-layout-detection"
	)

	with torch.inference_mode():
	input_ids = img_proc(img, return_tensors='pt')
	output = model(**input_ids)

	threshold=0.4

	segmentation_mask = img_proc.post_process_segmentation(
	out_seg,
	threshold=threshold,
	target_sizes=[img.size[::-1]]
	)

	bbox_pred = img_proc.post_process_object_detection(
	output,
	threshold=threshold,
	target_sizes=[img.size[::-1]]
	)
	```

	### Citation

	```
	@online{DeDetrLay,
	AUTHOR = {Cyrile Delestre},
	URL = {https://huggingface.co/cmarkea/detr-base-layout-detection},
	YEAR = {2024},
	KEYWORDS = {Image Processing ; Transformers ; Layout},
	}
	```