Preface
In the past, I used python's stuttering word segmentation tool for most of Chinese word segmentation. The word segmentation tool is to call API online. About the principle of this word segmentation tool, I recommend a good blog:
http://blog.csdn.net/daniel_ustc/article/details/48195287.
With the needs of the project, I need to use Stanford University's natural language processing package standford to build the dependency tree, but Standford is very mischievous to not let me do Chinese word segmentation (old error). In desperation, I can only use third-party word segmentation tools. Since the source code of Standford is java, I found a corresponding word segmentation tool, namely hanlp.
Installation and use of HanLP
One of the great benefits of HanLP is the offline open source toolkit. In other words, it not only provides free code to download for free, but also makes the hard-collected dictionaries public. This is a selfless act. When I install, I mainly refer to this blog:
http://m.blog.csdn.net/article/details?id=50938796
But this blog mainly introduces how to use hanlp in windows, and ubuntu is linux, so there will be differences. Below I mainly introduce the installation and use of unbuntu.
Install eclipse
Enter sudo get-apt install eclipse-platform in the terminal to achieve one-click installation, and then find eclipse in the application
Download hanlp
Visit the official website of hanlp: http://hanlp.linrunsoft.com/services.html
Download hanlp.jar (program package), data.zip (dictionary library), hanlp.properties (configuration file), and the following is the documentation, you don’t need to download it
When downloading data.zip, the download link is a bit obscure, click on the blue data-for-1.2.11.zip, and Baidu cloud link will appear.
Import the jar package
Import hanlp into eclipse, the specific process can refer to the website:
http://jingyan.baidu.com/article/ca41422fc76c4a1eae99ed9f.html
Import configuration file
Copy hanlp.propertie to the bin directory of the project and modify the path of the dictionary
Modify the path of root to the path where data is saved (remember to unzip data)
Programming code demonstration
import java.util.List;
import com.hankcs.hanlp.HanLP;
import com.hankcs.hanlp.seg.Segment;
import com.hankcs.hanlp.seg.common.Term;
public class DemoHanLP {
public static void main(String[] agrs){
String sentence = "Hello everyone, my name is Quincy.";
Segment segment = HanLP.newSegment();
List
for(Term term : termList){
System.out.print(term+ " ");
}
}
}
operation result:
The article comes from Quincy1994's blog
Recommended Posts