A research team from Google Brain and New York University says the Natural Language Understanding (NLU) evaluation system is “broken” and proposes four criteria for improving NLU benchmarks.

The paper What Will it Take to Fix Benchmarking in Natural Language Understanding? is on arXiv.

