public abstract class Segment extends Object
| 构造器和说明 |
|---|
Segment()
构造一个分词器
|
| 限定符和类型 | 方法和说明 |
|---|---|
protected static List<AtomNode> |
atomSegment(char[] charArray,
int start,
int end)
原子分词
|
protected static List<Vertex> |
combineByCustomDictionary(List<Vertex> vertexList)
使用用户词典合并粗分结果
|
protected static List<Vertex> |
combineByCustomDictionary(List<Vertex> vertexList,
WordNet wordNetAll)
使用用户词典合并粗分结果,并将用户词语收集到全词图中
|
Segment |
enableAllNamedEntityRecognize(boolean enable)
是否启用所有的命名实体识别
|
Segment |
enableCustomDictionary(boolean enable)
是否启用用户词典
|
Segment |
enableIndexMode(boolean enable)
设为索引模式
|
Segment |
enableJapaneseNameRecognize(boolean enable)
是否启用日本人名识别
|
Segment |
enableMultithreading(boolean enable)
开启多线程
|
Segment |
enableMultithreading(int threadNumber)
开启多线程
|
Segment |
enableNameRecognize(boolean enable)
开启人名识别
|
Segment |
enableNumberQuantifierRecognize(boolean enable)
是否启用数词和数量词识别
即[二, 十, 一] => [二十一],[十, 九, 元] => [十九元] |
Segment |
enableOffset(boolean enable)
是否启用偏移量计算(开启后Term.offset才会被计算)
|
Segment |
enableOrganizationRecognize(boolean enable)
开启机构名识别
|
Segment |
enablePartOfSpeechTagging(boolean enable)
开启词性标注
|
Segment |
enablePlaceRecognize(boolean enable)
开启地名识别
|
Segment |
enableTranslatedNameRecognize(boolean enable)
是否启用音译人名识别
|
protected void |
mergeNumberQuantifier(List<Vertex> termList,
WordNet wordNetAll,
Config config)
合并数字
|
protected static List<AtomNode> |
quickAtomSegment(char[] charArray,
int start,
int end)
快速原子分词,希望用这个方法替换掉原来缓慢的方法
|
List<Term> |
seg(char[] text)
分词
|
List<Term> |
seg(String text)
分词
此方法是线程安全的 |
List<List<Term>> |
seg2sentence(String text)
分词断句 输出句子形式
|
protected abstract List<Term> |
segSentence(char[] sentence)
给一个句子分词
|
protected static List<AtomNode> |
simpleAtomSegment(char[] charArray,
int start,
int end)
简易原子分词,将所有字放到一起作为一个词
|
protected Config config
protected static List<AtomNode> atomSegment(char[] charArray, int start, int end)
charArray - start - 从start开始(包含)end - 到end结束(不包含end)protected static List<AtomNode> simpleAtomSegment(char[] charArray, int start, int end)
charArray - start - end - protected static List<AtomNode> quickAtomSegment(char[] charArray, int start, int end)
charArray - start - end - protected static List<Vertex> combineByCustomDictionary(List<Vertex> vertexList)
vertexList - 粗分结果protected static List<Vertex> combineByCustomDictionary(List<Vertex> vertexList, WordNet wordNetAll)
vertexList - 粗分结果wordNetAll - 收集用户词语到全词图中protected void mergeNumberQuantifier(List<Vertex> termList, WordNet wordNetAll, Config config)
termList - public List<List<Term>> seg2sentence(String text)
text - 待分词句子protected abstract List<Term> segSentence(char[] sentence)
sentence - 待分词句子public Segment enableIndexMode(boolean enable)
public Segment enablePartOfSpeechTagging(boolean enable)
enable - public Segment enableNameRecognize(boolean enable)
enable - public Segment enablePlaceRecognize(boolean enable)
enable - public Segment enableOrganizationRecognize(boolean enable)
enable - public Segment enableCustomDictionary(boolean enable)
enable - public Segment enableTranslatedNameRecognize(boolean enable)
enable - public Segment enableJapaneseNameRecognize(boolean enable)
enable - public Segment enableOffset(boolean enable)
enable - public Segment enableNumberQuantifierRecognize(boolean enable)
enable - public Segment enableAllNamedEntityRecognize(boolean enable)
enable - public Segment enableMultithreading(boolean enable)
enable - true表示开启4个线程,false表示单线程public Segment enableMultithreading(int threadNumber)
threadNumber - 线程数量Copyright © 2014–2017 码农场. All rights reserved.