The researchers shared sample historical questions with TechCrunch that LLMs got wrong. For example, GPT-4 Turbo was asked whether scale armor was present during a specific time period in ancient Egypt. The LLM said yes, but the technology only appeared in Egypt 1,500 years later.
Why are LLMs bad at answering technical historical questions, when they can be so good at answering very complicated questions about things like coding? Del Rio-Chanona told TechCrunch that it’s likely because LLMs tend to extrapolate from historical data that is very prominent, finding it difficult to retrieve more obscure historical knowledge.
For example, the researchers asked GPT-4 if ancient Egypt had a professional standing army during a specific historical period. While the correct answer is no, the LLM answered incorrectly that it did. This is likely because there is lots of public information about other ancient empires, like Persia, having standing armies.
Top LLMs performed poorly on a high-level history test, a new paper has found.Charles Rollet (TechCrunch)
The state is seeing a sharp water divide this year, with lots of rain in the north while the south has stayed dry. A hydrologist explains what’s happening.The Conversation
More than 40% of individual corals monitored around One Tree Island reef bleached by heat stress and damaged by flesh-eating diseaseGraham Readfearn (The Guardian)
My Ideal Man Has All 7
1. ju-ju eyeball
2. toe jam football
3. good health insurance
4. monkey finger
5. hair down. to. his knee
6. doesn't do nazi salutes
7. walrus gumboot
Hungary could lose billions more in EU funding in the future if it fails to reform, but won'tEUROPE SAYS (EUROPESAYS.COM)
Brian Fitzgerald and Simons_Mith like this.