facebook
/

locate-3d

Model card Files Files and versions

pmcvay commited on Apr 17

Commit

ac23c1a

·

verified ·

1 Parent(s): bd1d043

Update README.md

Files changed (1) hide show

README.md +34 -6

README.md CHANGED Viewed

@@ -1,10 +1,38 @@
 ---
 license: cc-by-nc-4.0
-tags:
-- pytorch_model_hub_mixin
-- model_hub_mixin
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Library: [More Information Needed]
-- Docs: [More Information Needed]

 ---
 license: cc-by-nc-4.0
 ---
+# Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
+Official model weights for the `Locate-3D` models and the `3D-JEPA` encoders
+## Locate 3D
+Locate 3D is a model for localizing objects in 3D scenes from referring expressions like “the
+small coffee table between the sofa and the lamp.” Locate 3D sets a new state-of-the-art on standard
+referential grounding benchmarks and showcases robust generalization capabilities. Notably, Locate
+3D operates directly on sensor observation streams (posed RGB-D frames), enabling real-world
+deployment on robots and AR devices.
+## 3D-JEPA
+3D-JEPA, a novel self-supervised
+learning (SSL) algorithm applicable to sensor point clouds, is key to `Locate 3D`. It takes as input a 3D pointcloud
+featurized using 2D foundation models (CLIP, DINO). Subsequently, masked prediction in latent space
+is employed as a pretext task to aid the self-supervised learning of contextualized pointcloud features.
+Once trained, the 3D-JEPA encoder is finetuned alongside a language-conditioned decoder to jointly
+predict 3D masks and bounding boxes.
+## Models
+- **Locate-3D**: Locate-3D model trained on public referential grounding datasets
+- **Locate-3D+**: Locate-3D model trained on public referential grounding datasets and the newly released Locate 3D Dataset
+- **3D-JEPA**: Pre-trained SSL encoder for 3D understanding
+## How to Use
+For detailed instructions on how to load the encoder and integrate it into your downstream task, please refer to our [GitHub repository](https://github.com/facebookresearch/locate-3d).
+## License
+The majority of `locate-3` is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Pointcept is licensed under the MIT license.