How do you write a clean pointer generator model with pytorch

2 min readApr 3, 2021

[This story may only be sensible to the reader with a little bit background on torch, torchtext, seq2seq models and copy / pointer mechanism]

A pointer-generator is a powerful network for machine translation and abstractive summarization. Since original codes were not accomplished in PyTorch, we would like to show you how to get it done with PyTorch. Though this is certainly not the only one that implements pointer-generator networks in PyTorch, we aim to be the one of the cleanest ones. We adapt from one of the most popular torch tutorial for seq2seq text processing and tune it to work with pointer / copy mechanism.

Torchtext is not commonly involved in codes where copy mechanism is used. This is because the standard Torchtext uses a pre-built vocabulary set, and therefore is hard to handle out-of-vocabulary words. To implement pointer / copy mechanism without abandoning Torchtext, we write a light-weight extension for torchtext, though it is far from being official. A link to the repository is here:

The classes in this extension are expected to be used in just the same way as standard Torchtext classes should be. Just add ‘OOV’ after each class in Torchtext would be enough.

Any suggestion is welcome.

A few things to note:

1. To make this copy / pointer mechanism work, There must be a batch named ‘src’, i.e., a source sequence that contain recognizable out-of-vocabulary words.

2. Vocabulary size of each ‘Field’ instance is preferably the same as others.

3. You may check the code to see how it works, don’t panic, only 1/9 of the codes are written, and the rest is just copied from offical Torchtext sources.