1-800-BAD-CODE commited on
Commit
3e20b04
·
1 Parent(s): e62dcf1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md CHANGED
@@ -1,3 +1,49 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - ar
5
+ - bn
6
+ - de
7
+ - en
8
+ - es
9
+ - et
10
+ - fi
11
+ - fr
12
+ - hi
13
+ - id
14
+ - is
15
+ - it
16
+ - ja
17
+ - lt
18
+ - lv
19
+ - ko
20
+ - nl
21
+ - no
22
+ - pl
23
+ - pt
24
+ - ru
25
+ - tr
26
+ - sv
27
+ - uk
28
+ - zh
29
  ---
30
+
31
+ # Model Overview
32
+ This model performs sentence boundary prediction (SBD) with 25 languages.
33
+
34
+ This model accepts as input arbitraily-long, punctuated texts and produces as output the consituent sentences of the input.
35
+
36
+ # Model Architecture
37
+ This is a data-driven approach to SBD.
38
+
39
+ Input texts are encoded with a SentencePiece model, then encoded with a BERT-style encoder, then projected to sentence boundary probabilities via a linear layer.
40
+
41
+ For each input token `t`, this model predicts the probability that `t` is the final token of a sentence (i.e., a sentence boundary).
42
+
43
+ # Example Usage
44
+
45
+ This model has been exported to ONNX alongside an SentencePiece tokenizer.
46
+
47
+ ```bash
48
+
49
+ ```