NLTK is a platform for programming in Python to process natural language. NLTK provides a lot of text processing libraries, mostly for English. Formerly, I have built a model of Indonesian tagger using Stanford POS Tagger. That Indonesian model is used for this tutorial.
To install NLTK, you can run the following command in your command line. I assume that you are using Windows and you have read and followed my first tutorial (in Indonesian) of having two versions of Python in your laptop:
In this example, I use a previously trained tagger which I name myTagger.model. It is a model customized for Indonesian. Place the model under the nltk folder so that it will be nltk\myTagger.model. Download the stanford-postagger.jar from http://nlp.stanford.edu/software/tagger.shtml.
To run this tagger, write the following codes in command prompt:
import nltk
from nltk import *
myTagger = StanfordPOSTagger(myTagger.model, "E:\\stanford-postagger.jar")
myTagger.tag('Pada suatu hari, dia pergi ke kota Jakarta.'.split())
This will be the output:
[('', 'Pada/IN'), ('', 'suatu/CD'), ('', 'hari,/Z'), ('', 'dia/PRP'), ('', 'pergi/VB'), ('', 'ke/IN'), ('', 'kota/NN'), ('', 'Jakarta/NNP'), ('', '?/Z')]
Explanation of tags can be found on the website of POS Tagger of Information Retrieval Lab, Faculty of Computer Science, Universitas Indonesia.
Python version: 3.5
Windows: 8.0
NLTK: 3.2
Stanford tools: 3.5.1
To install NLTK, you can run the following command in your command line. I assume that you are using Windows and you have read and followed my first tutorial (in Indonesian) of having two versions of Python in your laptop:
python3 -m pip install -U nltk
In this example, I use a previously trained tagger which I name myTagger.model. It is a model customized for Indonesian. Place the model under the nltk folder so that it will be nltk\myTagger.model. Download the stanford-postagger.jar from http://nlp.stanford.edu/software/tagger.shtml.
To run this tagger, write the following codes in command prompt:
import nltk
from nltk import *
myTagger = StanfordPOSTagger(myTagger.model, "E:\\stanford-postagger.jar")
myTagger.tag('Pada suatu hari, dia pergi ke kota Jakarta.'.split())
This will be the output:
[('', 'Pada/IN'), ('', 'suatu/CD'), ('', 'hari,/Z'), ('', 'dia/PRP'), ('', 'pergi/VB'), ('', 'ke/IN'), ('', 'kota/NN'), ('', 'Jakarta/NNP'), ('', '?/Z')]
Explanation of tags can be found on the website of POS Tagger of Information Retrieval Lab, Faculty of Computer Science, Universitas Indonesia.
Python version: 3.5
Windows: 8.0
NLTK: 3.2
Stanford tools: 3.5.1
hallo,
ReplyDeletecan I ask how you build previously trained tagger (myTagger.model)? can we get this model for trained tagger too?
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteHi, i want to know how you build your own trained tagger (mytagger.model). Could you make the tutorial too?
ReplyDeleteHi, i want to build my own trained tagger (for Arabic language). Can you help me please?
ReplyDeletei want to is as well, did you figure out how to do it ?
Delete