LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks
LongBench v2 is a new test to see how well AI can understand and answer questions about really long texts, like books, articles, and code. The test has over 500 questions, and even experts have trouble answering them quickly. The test covers lots of different types of questions, like figuring out who did a crime in a story, translating a new language, and understanding how a computer program works. The test is hard because it makes AI think deeply about the information and not just find simple answers. The researchers who made LongBench v2 hope it will help make AI even smarter and better at understanding complicated things. https://arxiv.org/pdf/2412.15204