Overview | Performance | Quick start | Documentation | 中文文档 | Contributing
Kashgari is a simple and powerful NLP Transfer learning framework, build a state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS), and text classification tasks.
- Human-friendly. Kashgari's code is straightforward, well documented and tested, which makes it very easy to understand and modify.
- Powerful and simple. Kashgari allows you to apply state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS) and classification.
- Built-in transfer learning. Kashgari built-in pre-trained BERT and Word2vec embedding models, which makes it very simple to transfer learning to train your model.
- Fully scalable. Kashgari provides a simple, fast, and scalable environment for fast experimentation, train your models and experiment with new approaches using different embeddings and model structure.
- Production Ready. Kashgari could export model with
SavedModelformat for tensorflow serving, you could directly deploy it on the cloud.
- Academic users Easier experimentation to prove their hypothesis without coding from scratch.
- NLP beginners Learn how to build an NLP project with production level code quality.
- NLP developers Build a production level classification/labeling model within minutes.
|Named Entity Recognition||Chinese||People's Daily Ner Corpus||94.46 (F1)||Text Labeling Performance Report|
Here is a set of quick tutorials to get you started with the library:
There are also articles and posts that illustrate how to use Kashgari:
- 15 分钟搭建中文文本分类模型
- 基于 BERT 的中文命名实体识别（NER)
- BERT/ERNIE 文本分类和部署
- Multi-Class Text Classification with Kashgari in 15 minutes
Requirements and Installation
The project is based on Python 3.6+, because it is 2019 and type hinting is cool.
Let's run an NER labeling model with Bi_LSTM Model.
from kashgari.corpus import ChineseDailyNerCorpus from kashgari.tasks.labeling import BiLSTM_Model train_x, train_y = ChineseDailyNerCorpus.load_data('train') test_x, test_y = ChineseDailyNerCorpus.load_data('test') valid_x, valid_y = ChineseDailyNerCorpus.load_data('valid') model = BiLSTM_Model() model.fit(train_x, train_y, valid_x, valid_y, epochs=50) """ _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input (InputLayer) (None, 97) 0 _________________________________________________________________ layer_embedding (Embedding) (None, 97, 100) 320600 _________________________________________________________________ layer_blstm (Bidirectional) (None, 97, 256) 235520 _________________________________________________________________ layer_dropout (Dropout) (None, 97, 256) 0 _________________________________________________________________ layer_time_distributed (Time (None, 97, 8) 2056 _________________________________________________________________ activation_7 (Activation) (None, 97, 8) 0 ================================================================= Total params: 558,176 Trainable params: 558,176 Non-trainable params: 0 _________________________________________________________________ Train on 20864 samples, validate on 2318 samples Epoch 1/50 20864/20864 [==============================] - 9s 417us/sample - loss: 0.2508 - acc: 0.9333 - val_loss: 0.1240 - val_acc: 0.9607 """
Run with GPT-2 Embedding
from kashgari.embeddings import GPT2Embedding from kashgari.corpus import ChineseDailyNerCorpus from kashgari.tasks.labeling import BiGRU_Model train_x, train_y = ChineseDailyNerCorpus.load_data('train') valid_x, valid_y = ChineseDailyNerCorpus.load_data('valid') gpt2_embedding = GPT2Embedding('<path-to-gpt-model-folder>', sequence_length=30) model = BiGRU_Model(gpt2_embedding) model.fit(train_x, train_y, valid_x, valid_y, epochs=50)
Run with Bert Embedding
from kashgari.embeddings import BERTEmbedding from kashgari.tasks.labeling import BiGRU_Model from kashgari.corpus import ChineseDailyNerCorpus bert_embedding = BERTEmbedding('<bert-model-folder>', sequence_length=30) model = BiGRU_Model(bert_embedding) train_x, train_y = ChineseDailyNerCorpus.load_data() model.fit(train_x, train_y)
Support this project by becoming a sponsor. Your issues and feature request will be prioritized.[Become a sponsor]
Thanks goes to these wonderful people. And there are many ways to get involved. Start with the contributor guidelines and then check these open issues for specific tasks.
Feel free to join the Slack group if you want to more involved in Kashgari's development.
This library is inspired by and references following frameworks and papers.
- flair - A very simple framework for state-of-the-art Natural Language Processing (NLP)
- anago - Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging
This project follows the all-contributors specification. Contributions of any kind welcome!