ChatGPT - not so clever after all - on logical tasks


I really like the work of Adrian Groza who explores the puzzle solving using first order logic.

In this recently printed paper, Adrian explores the reasoning capabilities of ChatGPT. Starting with a large quantity of logical puzzles reformed in a way that can be presented to ChatGPT. The large language model then tries to present reasoned answers. No spoilers, but it doesn't go well.

Link to paper on ResearchGate here:

 (PDF) Measuring reasoning capabilities of ChatGPTresearchgate.net

The paper goes the extra mile beyond just testing ChatGPT problem solving capabilities. The paper looks in detail at the dialogue and extracts the type of nature of the logical fallacies that are included in chat GPT's reasoning. There appears to be lack of common sense, better answers are presented as being worse answers. In some cases extra constraints are unnecessarily added to problems, leading to the inability to find the correctly available answer. 

One conclusion of the paper correctly identifies that ChatGPT does have strengths in the language processing area, but falls very short when required to do detailed factual, correct problem-solving work. 

The dangerous/great thing about "trusting an Oracle" without substantive background and showing the workings is that it can learn from feedback....

From the paper ... Preprint · October 2023

Run today 3 December 2023


Good to see that ChatGPT can make a fair assessment of its own abilities on logical problems and puzzles. 

After concluding that 693 is equal to 729 and a father aged 29 can have children aged 24 and 21 Chat GPT provided the admission that "For the most accurate and reliable results on logical puzzles, it's often better to use dedicated problem-solving tools or consult with experts in the relevant field."  

 


*** Update Jan 2024 -- See also  ChatGPT bombs test on diagnosing kids’ medical cases with 83% error rate


**** From the Source itself

Q: Can chatGPT solve all sudoku ?

ChatGPT

ChatGPT can help solve many Sudoku puzzles by applying a logical, step-by-step approach, but it does have some limitations:

  1. Logical Strategies and Explanation: ChatGPT can walk you through various solving techniques (like Naked Singles, Hidden Pairs, X-Wing, etc.) and apply them in a logical order to solve the puzzle. For many standard Sudoku puzzles, this method is effective and often enough to reach a solution.

  2. Complexity and Advanced Techniques: For highly complex puzzles, such as those requiring advanced techniques (e.g., Swordfish, Jellyfish, or even more intricate techniques used in extreme Sudoku variants), ChatGPT may need external support, like an SMT solver or specific backtracking algorithms, which require computational logic and precision.

  3. Consistency and Verification: ChatGPT may not perfectly track all the placements or logic steps in real-time without a computational system in place. It’s designed for language and reasoning but lacks the persistent, systematic memory needed to ensure that every logical deduction is correct and free of contradictions.

  4. No Access to Real-Time Calculations: In this chat-only environment, ChatGPT doesn't directly execute code to solve Sudoku with brute-force algorithms (such as backtracking or SMT solvers) or check constraints dynamically. It can outline these methods in code (like Python or Z3), which you can run in a suitable programming environment.

Conclusion

In short, while ChatGPT can explain and help solve many standard Sudoku puzzles through logical reasoning and pattern recognition, it may not be able to solve all Sudoku puzzles independently without computational assistance, especially those requiring extensive backtracking or advanced constraints. However, with external tools, such as an SMT solver or a Sudoku-solving algorithm, it could solve even the most challenging puzzles reliably.






No comments:

Post a Comment

SMT Solvers, introduction and links (Start here with the readme)

Picat - First steps

 Getting started with a new software system can always be a little daunting. Just a little bit of help to get over the curb can make that jo...