Here, we will show you how to extract constituency tree string for sentences step-by-step.
We recommend you to install dependencies in Python virtual environment (Python 3.x). We use Anaconda for package management. See simple tutorial for usage.
$ conda create -n grammar-pattern python=3 anaconda
$ conda activate grammar-pattern
$ conda install pip
We only return constituency tree string from the original version of constituency tree parser instead of the other unwanted data (it's very huge!). We save our modification to the grammar-pattern
branch, so please remember to switch to the grammar-pattern
branch before running the parser.
$ git clone https://github.com/howardyclo/allennlp.git
$ git checkout grammar-pattern
$ pip install -r requirements.txt
The input format for AllenNLP's model requires jsonline format. For example, if you need to parse a text file like:
This is the first sentence .
This is the second sentence .
This is the third sentence .
You need to convert them to lines of JSON object:
{"sentence": "This is the first sentence ."}
{"sentence": "This is the second sentence ."}
{"sentence": "This is the third sentence ."}
Now, let's test a single sentence as input!
$ echo '{"sentence": "He likes to discuss the issues ."}' > sample.input.jsonl
On CPU:
$ python -m allennlp.run predict \
https://s3-us-west-2.amazonaws.com/allennlp/models/elmo-constituency-parser-2018.03.14.tar.gz \
sample.input.jsonl \
--predictor=constituency-parser \
--output-file sample.output.txt
On GPU:
$ python -m allennlp.run predict \
https://s3-us-west-2.amazonaws.com/allennlp/models/elmo-constituency-parser-2018.03.14.tar.gz \
sample.input.jsonl \
--predictor=constituency-parser \
--output-file sample.output.txt \
--batch-size 32 \
--cuda-device 0
Output from terminal (add --silent
without printing message to the terminal):
input: {'sentence': 'He likes to discuss the issues .'}
prediction: "(S (NP (PRP He)) (VP (VBZ likes) (S (VP (TO to) (VP (VB discuss) (NP (DT the) (NNS issues)))))) (. .))"
Output from output file:
"(S (NP (PRP He)) (VP (VBZ likes) (S (VP (TO to) (VP (VB discuss) (NP (DT the) (NNS issues)))))) (. .))"
See more about the arguments.