Recent studies have provided fascinating insights into the cognitive processing abilities of large language models (LLMs). I’d like to present two interesting studies that provide a better understanding of these models’ performance in various tasks.
Cognitive Reasoning in LLMs
A study published in Nature examined the reasoning abilities of LLMs by presenting both human participants and various pretrained LLMs with new variants of classical cognitive experiments. The results were enlightening: most LLMs exhibited reasoning errors similar to those associated with heuristic-based human reasoning.
To understand the implications, we need some details.
In cognitive psychology, Daniel Kahneman’s dual-system theory distinguishes between two modes of thinking:
- System 1: Fast, automatic, intuitive, and often subconscious. It handles everyday decisions quickly with little effort but can be prone to biases and errors.
- System 2: Slow, deliberate, analytical, and conscious. It requires more effort and is used for complex decision-making and problem-solving, reducing the likelihood of biases.
A prominent tool for evaluating reasoning is the Cognitive Reflection Test (CRT), which assesses the ability to override an intuitive wrong answer in favor of a reflective, correct one. A classic CRT question is the “bat and the ball” problem: “A bat and a ball together cost $1.10. The bat costs $1.00 more than the ball. How much does the ball cost?” The intuitive answer is $0.10, but the correct answer is $0.05.
The study’s findings indicated that while older LLMs often made reasoning errors akin to those of humans (reflecting System 1 thinking), newer models like GPT-4 outperform humans, suggesting an ability to engage in System 2 processing.
LLMs and Coding Proficiency
Another study, highlighted by IEEE Spectrum, evaluated ChatGPT’s performance in generating functional code. The results revealed a wide range of success rates, from as low as 0.66% to as high as 89%.
ChatGPT excelled at solving problems in various coding languages, particularly for algorithm problems on LeetCode that existed before 2021. Its performance declined for problems introduced after 2021. This suggests that ChatGPT’s success is heavily influenced by the presence of similar problems in its training dataset.
Implications and Insights
These findings suggest that while LLMs can mimic System 2 thinking, they predominantly rely on System 1 processes. This also brings up another issue: the more ChatGPT is subjected to tests, the better it will become at passing them.
The implications are significant:
- Training Potential: LLMs can be trained to solve a wide range of problems given enough examples.
- Novel Problem-Solving: Do not rely on LLMs to address new issues.
- But this also raises the issue of Identifying New Problems: It is essential to identify truly novel problems when using LLMs, as what might be new for humans could be routine for LLMs. Translation of technical texts is a prime example: even with completely new content, LLMs can typically produce an “acceptable” translation. The coding proficiency of LLMs demonstrates that the opposite can also be true.
Why It’s Crucial for Synapse Postmaster
Synapse Postmaster is a product designed to bridge the gap between paper-based and digital processes by enabling systems to understand unstructured information. Its primary role is to analyze various types of documents, including emails, scanned letters, and more, and integrate the extracted data into back-office systems. At the core of Postmaster lies an inference engine that acts as a strategic coordinator, integrating advanced AI technologies such as large language models (LLMs), machine learning models, optical character recognition (OCR)…
These technologies allow it to process different types of data—text, images, audio—and ensure that the information is accurately understood and processed for downstream business operations.
Understanding how AI works is crucial for selecting the right tools to perform specific tasks. AI systems, such as LLMs, are built on different architectures, each excelling in certain types of problem-solving. Knowing their strengths and limitations allows for more strategic and efficient use of AI technologies.
Understanding that LLMs primarily depend on System 1 thinking—fast, intuitive processing—highlights their limitations in more analytical tasks. If a task demands deeper, reflective reasoning or needs to follow strict rules, ensure compliance, or maintain traceability, it’s mandatory to get a rules-based engine to guide the LLM. This combination ensures tasks requiring more deliberate, logical, or regulated processing are handled correctly.
In summary, understanding how AI works not only helps us select the most appropriate tool for a task but also ensures we deploy these tools in ways that complement their strengths and mitigate their limitations.
Conclusion
LLM capabilities are often impressive, with some even claimed to pass the Turing test. However, it’s crucial to remember that their intelligence is fundamentally different from human intelligence. Thanks to their extensive “memory,” they are the best in System 1 thinking. With such vast memory, they could even employ System 1 thinking in situations where a human would typically use System 2 thinking.
Understanding their cognitive biases is crucial for effectively leveraging their potential. In addition to being a fascinating subject of study (a non-human intelligence we can communicate with!).