桥头镇网站建设,卢松松外链工具,阿里巴巴1688,做外贸网站需要多少钱Spark使用HanLP分词 将HanLP的data(包含词典和模型)放到hdfs上#xff0c;然后在项目配置文件hanlp.properties中配置root的路径#xff0c;比如#xff1a;roothdfs://localhost:9000/tmp/ 实现com.hankcs.hanlp.corpus.io.IIOAdapter接口 public static class Hadoop…Spark使用HanLP分词 将HanLP的data(包含词典和模型)放到hdfs上然后在项目配置文件hanlp.properties中配置root的路径比如roothdfs://localhost:9000/tmp/ 实现com.hankcs.hanlp.corpus.io.IIOAdapter接口 public static class HadoopFileIoAdapter implements IIOAdapter {Overridepublic InputStream open(String path) throws IOException {Configuration conf new Configuration();FileSystem fs FileSystem.get(URI.create(path), conf);return fs.open(new Path(path));}Overridepublic OutputStream create(String path) throws IOException {Configuration conf new Configuration();FileSystem fs FileSystem.get(URI.create(path), conf);OutputStream out fs.create(new Path(path));return out;}}设置IoAdapter创建分词器 private static Segment segment;static {HanLP.Config.IOAdapter new HadoopFileIoAdapter();segment new CRFSegment();
}然后就可以在Spark的操作中使用segment进行分词了。 原文链接https://blog.csdn.net/l294265421/article/details/72932042