与Pytorch分类有关的NLP纸张实施
这些论文是在使用韩国语料库中实施的
pyenv virualenv 3.7.7 nlp
pyenv activate nlp
pip install -r requirements.txt
python build_dataset.py
python build_vocab.py
python train.py # default training parameter
python evaluate.py # defatul evaluation parameter
单句分类(情感分类任务)
- 使用naver情感电影语料库v1.0(又称
nsmc )
- 配置
-
conf/model/{type}.json (例如type = [\"sencnn\", \"charcnn\",...] )
-
conf/dataset/nsmc.json
- 结构
# example: Convolutional_Neural_Networks_for_Sentence_Classification
├── build_dataset.py
├── build_vocab.py
├── conf
│ ├── dataset
│ │ └── nsmc.json
│ └── model
│ └── sencnn.json
├── evaluate.py
├── experiments
│ └── sencnn
│ └── epochs_5_batch_size_256_learning_rate_0.001
├── model
│ ├── data.py
│ ├── __init__.py
│ ├── metric.py
│ ├── net.py
│ ├── ops.py
│ ├── split.py
│ └── utils.py
├── nsmc
│ ├── ratings_test.txt
│ ├── ratings_train.txt
│ ├── test.txt
│ ├── train.txt
│ ├── validation.txt
│ └── vocab.pkl
├── train.py
└── utils.py
| 模型\\精度 |
火车(120,000) |
验证(30,000) |
测试(50,000) |
日期 |
| Sencnn |
91.95% |
86.54% |
85.84% |
20/05/30 |
| 查克恩 |
86.29% |
81.69% |
81.38% |
20/05/30 |
| convrec |
86.23% |
82.93% |
82.43% |
20/05/30 |
| vdcnn |
86.59% |
84.29% |
84.10% |
20/05/30 |
| 圣 |
90.71% |
86.70% |
86.37% |
20/05/30 |
| 埃特里伯特 |
91.12% |
89.24% |
88.98% |
20/05/30 |
| Sktbert |
92.20% |
89.08% |
88.96% |
20/05/30 |
-
句子分类的卷积神经网络(作为SENCNN)
- https://ar*xiv.o*rg/*abs/1408.5882
-
字符级卷积网络用于文本分类(作为charcnn)
- https://arxiv.o*r**g/abs/1509.01626
-
通过组合卷积和经常性层(作为Convrec),有效的角色级文档分类
- https://a*rxi*v.o*rg/abs/1602.00367
-
文本分类的非常深的卷积网络(作为VDCNN)
- https://arx*iv.**org/abs/1606.01781
-
结构化的自我实践句子嵌入(作为SAN)
- https://*arxiv.org**/abs/1703.03130
-
bert_single_sentence_classification(作为Etribert,Sktbert)
- https://arx*iv**.org/abs/1810.04805
成对的文本分类(解释任务)
- 从https://githu*b**.com/songys/question_pair创建数据集
- 配置
-
conf/model/{type}.json (例如type = [\"siam\", \"san\",...] )
-
conf/dataset/qpair.json
- 结构
# example: Siamese_recurrent_architectures_for_learning_sentence_similarity
├── build_dataset.py
├── build_vocab.py
├── conf
│ ├── dataset
│ │ └── qpair.json
│ └── model
│ └── siam.json
├── evaluate.py
├── experiments
│ └── siam
│ └── epochs_5_batch_size_64_learning_rate_0.001
├── model
│ ├── data.py
│ ├── __init__.py
│ ├── metric.py
│ ├── net.py
│ ├── ops.py
│ ├── split.py
│ └── utils.py
├── qpair
│ ├── kor_pair_test.csv
│ ├── kor_pair_train.csv
│ ├── test.txt
│ ├── train.txt
│ ├── validation.txt
│ └── vocab.pkl
├── train.py
└── utils.py
| 模型\\精度 |
火车(6,136) |
验证(682) |
测试(758) |
日期 |
| 暹 |
93.00% |
83.13% |
83.64% |
20/05/30 |
| 圣 |
89.47% |
82.11% |
81.53% |
20/05/30 |
| 随机 |
89.26% |
82.69% |
80.07% |
20/05/30 |
| 埃特里伯特 |
95.07% |
94.42% |
94.06% |
20/05/30 |
| Sktbert |
95.43% |
92.52% |
93.93% |
20/05/30 |
-
结构化的自我实践句子嵌入(作为SAN)
- https://*arxiv.org**/abs/1703.03130
-
用于学习句子相似性的暹罗经常性架构(作为暹罗)
- https://www.*aa**ai.org/ocs/index.php/aaai/aaai16/paper/paper/viewpaper/12195
-
自然语言推断的随机答案网络(随机推理)
- https://ar**xiv.org*/abs/1804.07888
-
bert_pairwise_text_classification(作为Etribert,Sktbert)
- https://arx*iv**.org/abs/1810.04805