What does paraphasing do? Please tell the meaning of paraphrasing. Write down the definition of paraphasing… So, what is paraphrasing? I have just done that.
According to the definition in this repo, paraphrase generation is the task to generate an output sentence which is sementically identical to the input sentence but contains variations in lexicon or syntext.
Wait a minute. What does it mean by being ‘semantically similar’? Is there are common quantative measure or is there a behavioral testing, like Turing test? The short answer is no. There are indeed many metrics to evaluate the ‘similarity’ between two input sentences, like BLEU, ROUGE, METEOR or BERT-score. These metrics are so far the best options. But for example, ‘a jaguar kills a crocodile’ and ‘a crocodile kills a jaguar’ would get pretty high scores on ROUGE, as all words match.
Our discussion here is neither to deny the definition above, nor to criticise the metrics. Rather, we would like to analyse what paraphrasing really is, intuitively. I will try to put as much evidence, and am welcome to any challenging.
First of all, ‘similar’, ‘identical’ or ‘the same’ are the labels to sentences that are paraphrased. But, without a quantative measure or behavioral test, these labels are vague. On the one hand, quantative metrics are so hard to define, as we have seen above. Moreover, usually when quantative metrics are defined, they have to be made sure to align with human intuitions. Given this situation, quantative metrics are not again intrinsically objective and clear, as it is still backed by human intuitions. On the other hand, I haven’t seen much about how we humans decide whether a sentence is a paraphrase of another. We might just say that ‘oh, these two sentences look the same’! But, that’s not so scientific. When we want to decide whether a artificial robot is clever enough, we can’t just say that ‘oh, it is close to my IQ’! Instead, we need to find some behavioral tests. In the AI scenario, that’s Turing test.
I want to share an idea, not formally, that there could be a loaclly valid behavioral test for paraphrasing, or what it means by being ‘semantically identical’ and also ‘varies in syntext’.
This test is that if a question on an exam paper is asked in different manners, yet the desired answer from students are not changed, we may call these different questions a paraphrasing to each other. The BIG assumption here to disambiguate the test. When we replace the unchanged entities in all paraphrased questions with another set of entities (that is replace A1, A2, …, An with B1, B2, …, Bn, respectively) the test still holds.
To explain that, look at the two examples below. The first reflects a paraphrase, while the second does not.
Here comes the first example. ‘Please draw a line under the biggest number among (14, 88, 6, 90.9, -4)’ and ‘Underline the number that is bigger than any other number in (14, 88, 6, 90.9, -4)’ are paraphrasing each other. This paraphrasing holds even if I change (14, 88, 6, 90.9, -4) to some other sets of numbers, that is, in both cases, the correct answer is: 90.9 underlined.
However, the second example is not a paraphrase, though the answer are sometimes the same. ‘The state with largest area in the United States (2021)’ and ‘the coldest State in the United States (2021)’ are both ‘Alaska’. However, ‘the coldest provincial division in China (PRC, 2021)’ is not the same as ‘The provincial division with largest area in China (PRC, 2021)’ . The answer for the former is Heilongjiang, and the answer for the latter Xinjiang. This tells us that ‘the [_] with largest area’ is not a paraphrase to ‘the coldest [_]’.
Well, take a break, we have struggled so much to build this behaviorial test. It locally solves the definition ambiguity of ‘semantically identical’. But, what it not yet gives a clear definition of ‘varying in syntext’. We might as well say that if the two sentences does not look exactly the same, then they ‘vary in syntext’.
We now have a fairly logical definition for paraphrasing. But, it is not perfect. First, it is only verifiable in a limited space. For instance, ‘I saw a girl wearing a blue T-shirt’ and ‘I saw a girl who is wearing a blue T-shirt’. Here, I just vary the second phrasing a bit, to avoid any unnecessary ambiguity. The bigger problem is that there is not a behavioral test, even if the two sentences are infinitely likely to be ‘semantically identical’.
Another problem for this definition is that it is to rigid, and any soft paraphrasing could be rejected by logic. For instance, ‘I saw a girl wearing a blue T-shirt’, ‘I saw a girl with a blue T-shirt’, ‘I saw a girl in a blue T-shirt’, and ‘I saw a girl in blue’. One may say they are ‘similar’, whereas other may emphasize that they are after all not ‘identical’. So, who’s right?
Eventually, we fall back to the beginning: let’s find a quantative metric. Don’t get sad, though we haven’t found a proper one.
Today, there are plausible frameworks for paraphrasing. Personally, I like the one proposed by John Wieting, in Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. In this paper, paraphrase is generated as adversarial examples to some other discriminator model. The model could be considered as a ‘student’ in the examples above, but there are no downstream tasks for these ‘students’ and these ‘students’ only need to figure out whether the two input sentences are ‘semantically identical’.
We will continue on how paraphrasing is treated by neural models in our next discussion.