diff --git a/.asset/model_explan1.PNG b/.asset/model_explan1.PNG new file mode 100644 index 0000000..3ee0a08 Binary files /dev/null and b/.asset/model_explan1.PNG differ diff --git a/.asset/model_explan2.PNG b/.asset/model_explan2.PNG new file mode 100644 index 0000000..c1b9a16 Binary files /dev/null and b/.asset/model_explan2.PNG differ diff --git a/README.md b/README.md index a7a4ec5..18664fb 100644 --- a/README.md +++ b/README.md @@ -57,7 +57,14 @@ Marrying Grounding DINO gd_gligen - +## :star: Explanation/Tips for Grounding DINO Inputs and Outputs +- Grounding DINO accepts with a `(image, text)` pair as inputs. +- It will outputs `900` (by default) object boxes. Each box has a similarity scores across all input words. +- We defaultly choose the boxes whose highest similarities are higher than a `box_threshold`. +- We clip the words whose similarities are higher than the `text_threshold` as predicted labels. +- If you want to obtain objects of certain phrases, like the `dogs` in the sentence `two dogs with a stick.`, you can select the boxes with highest text similarities with `dogs` as final outputs. +![model_explain1](.asset/model_explan1.PNG) +![model_explain2](.asset/model_explan2.PNG) ## :label: TODO