rivertext

2025-12-10 0 973

rivertext

Rivertex是一个开源库,用于建模和培训由最先进的艺术品提出的不同增量词向量体系结构。

它试图将许多现有的增量词向量算法标准化为一个统一的框架,以提供标准化的界面并促进新方法的开发。

Rivertex提供了两个培训范式:

  • learn_one ,一次训练一个实例;

  • learn_many ,一次训练一个小批量的实例。

这允许使用文本数据流对文本表示模型进行更有效的培训。

rivertext还提供了类似于river套件的接口,使开发人员可以轻松地使用库快速,轻松地训练文本表示模型。

正式文档可以在此链接上找到。

安装

rivertext目的是与Python 3.10及以上合作。可以通过pip完成安装:

 pip install rivertext

要求

这些软件包将与软件包一起安装,以防这些软件包:这些软件包尚未安装:

  1. NLTK
  2. numpy
  3. Scikit_learn
  4. Scipy
  5. 火炬
  6. TQDM
  7. 单词插件基准

贡献

开发要求

测试

所有单元测试均在rivertext /Tests文件夹中。它使用pytest作为框架来运行它们。

要运行测试,请执行:

 pytest tests

要检查覆盖范围,请运行:

 pytest tests --cov-report xml:cov.xml --cov rivertext

进而:

 coverage report -m

构建文档

该文档是使用mkdocsmkdocs-material创建的。它可以在项目根部的文档文件夹中找到。首先,您需要安装:

 pip install mkdocs
pip install \"mkdocstrings[python]\"
pip install mkdocs-material

然后,要编译文档,运行:

 mkdocs build
mkdocs serve

ChangElog

引用

如果您在学术出版物中使用此软件包,请引用以下论文:

G. Iturra-Bocaz和F. Bravo-Marquez rivertext :用于培训和评估文本数据流的增量单词嵌入的Python库。在第46届国际ACM SIGIR信息检索研究与开发会议论文集(Sigir 2023),台湾台北。

rivertext: A Python Library for Training and Evaluating Incremental Word Embeddings from Text Data Streams},
year = {2023},
isbn = {9781450394086},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://d*oi.org*/*10.1145/3539618.3591908},
doi = {10.1145/3539618.3591908},
abstract = {Word embeddings have become essential components in various information retrieval and natural language processing tasks, such as ranking, document classification, and question answering. However, despite their widespread use, traditional word embedding models present a limitation in their static nature, which hampers their ability to adapt to the constantly evolving language patterns that emerge in sources such as social media and the web (e.g., new hashtags or brand names). To overcome this problem, incremental word embedding algorithms are introduced, capable of dynamically updating word representations in response to new language patterns and processing continuous data streams.This paper presents rivertext , a Python library for training and evaluating incremental word embeddings from text data streams. Our tool is a resource for the information retrieval and natural language processing communities that work with word embeddings in streaming scenarios, such as analyzing social media. The library implements different incremental word embedding techniques, such as Skip-gram, Continuous Bag of Words, and Word Context Matrix, in a standardized framework. In addition, it uses PyTorch as its backend for neural network training.We have implemented a module that adapts existing intrinsic static word embedding evaluation tasks for word similarity and word categorization to a streaming setting. Finally, we compare the implemented methods with different hyperparameter settings and discuss the results.Our open-source library is available at https://g**ithub*.com/dccuchile/rivertext.},
booktitle = {Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages = {3027–3036},
numpages = {10},
keywords = {data streams, word embeddings, incremental learning},
location = {Taipei, Taiwan},
series = {SIGIR \’23}
}\”>

 @inproceedings { 10.1145/3539618.3591908 ,
author = { Iturra-Bocaz, Gabriel and Bravo-Marquez, Felipe } ,
title = { rivertext : A Python Library for Training and Evaluating Incremental Word Embeddings from Text Data Streams } ,
year = { 2023 } ,
isbn = { 9781450394086 } ,
publisher = { Association for Computing Machinery } ,
address = { New York, NY, USA } ,
url = { https://doi.*org/**10.1145/3539618.3591908 } ,
doi = { 10.1145/3539618.3591908 } ,
abstract = {Word embeddings have become essential components in various information retrieval and natural language processing tasks, such as ranking, document classification, and question answering. However, despite their widespread use, traditional word embedding models present a limitation in their static nature, which hampers their ability to adapt to the constantly evolving language patterns that emerge in sources such as social media and the web (e.g., new hashtags or brand names). To overcome this problem, incremental word embedding algorithms are introduced, capable of dynamically updating word representations in response to new language patterns and processing continuous data streams.This paper presents rivertext , a Python library for training and evaluating incremental word embeddings from text data streams. Our tool is a resource for the information retrieval and natural language processing communities that work with word embeddings in streaming scenarios, such as analyzing social media. The library implements different incremental word embedding techniques, such as Skip-gram, Continuous Bag of Words, and Word Context Matrix, in a standardized framework. In addition, it uses PyTorch as its backend for neural network training.We have implemented a module that adapts existing intrinsic static word embedding evaluation tasks for word similarity and word categorization to a streaming setting. Finally, we compare the implemented methods with different hyperparameter settings and discuss the results.Our open-source library is available at https://g**ithub*.com/dccuchile/rivertext.},
booktitle = { Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval } ,
pages = { 3027–3036 } ,
numpages = { 10 } ,
keywords = { data streams, word embeddings, incremental learning } ,
location = { Taipei, Taiwan } ,
series = { SIGIR \'23 }
}

团队

  • Gabriel Iturra-Bocaz
  • Felipe Bravo-Marquez

接触

请在UIS.NO上写信给Gabriel.e.iturrabocaz以查询有关该软件的询问。也欢迎您在GitHub的rivertext存储库中发出拉动请求或发布问题。

下载源码

通过命令行克隆项目:

git clone https://github.com/dccuchile/rivertext.git

收藏 (0) 打赏

感谢您的支持,我会继续努力的!

打开微信/支付宝扫一扫,即可进行扫码打赏哦,分享从这里开始,精彩与您同在
点赞 (0)

申明:本文由第三方发布,内容仅代表作者观点,与本网站无关。对本文以及其中全部或者部分内容的真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。本网发布或转载文章出于传递更多信息之目的,并不意味着赞同其观点或证实其描述,也不代表本网对其真实性负责。

左子网 编程相关 rivertext https://www.zuozi.net/33147.html

di quantum safe
上一篇: di quantum safe
lexical_scanner
下一篇: lexical_scanner
常见问题
  • 1、自动:拍下后,点击(下载)链接即可下载;2、手动:拍下后,联系卖家发放即可或者联系官方找开发者发货。
查看详情
  • 1、源码默认交易周期:手动发货商品为1-3天,并且用户付款金额将会进入平台担保直到交易完成或者3-7天即可发放,如遇纠纷无限期延长收款金额直至纠纷解决或者退款!;
查看详情
  • 1、描述:源码描述(含标题)与实际源码不一致的(例:货不对板); 2、演示:有演示站时,与实际源码小于95%一致的(但描述中有”不保证完全一样、有变化的可能性”类似显著声明的除外); 3、发货:不发货可无理由退款; 4、安装:免费提供安装服务的源码但卖家不履行的; 5、收费:价格虚标,额外收取其他费用的(但描述中有显著声明或双方交易前有商定的除外); 6、其他:如质量方面的硬性常规问题BUG等。 注:经核实符合上述任一,均支持退款,但卖家予以积极解决问题则除外。
查看详情
  • 1、左子会对双方交易的过程及交易商品的快照进行永久存档,以确保交易的真实、有效、安全! 2、左子无法对如“永久包更新”、“永久技术支持”等类似交易之后的商家承诺做担保,请买家自行鉴别; 3、在源码同时有网站演示与图片演示,且站演与图演不一致时,默认按图演作为纠纷评判依据(特别声明或有商定除外); 4、在没有”无任何正当退款依据”的前提下,商品写有”一旦售出,概不支持退款”等类似的声明,视为无效声明; 5、在未拍下前,双方在QQ上所商定的交易内容,亦可成为纠纷评判依据(商定与描述冲突时,商定为准); 6、因聊天记录可作为纠纷评判依据,故双方联系时,只与对方在左子上所留的QQ、手机号沟通,以防对方不承认自我承诺。 7、虽然交易产生纠纷的几率很小,但一定要保留如聊天记录、手机短信等这样的重要信息,以防产生纠纷时便于左子介入快速处理。
查看详情

相关文章

猜你喜欢
发表评论
暂无评论
官方客服团队

为您解决烦忧 - 24小时在线 专业服务