I really like the work of Adrian Groza who explores the puzzle solving using first order logic.
In this recently printed paper, Adrian explores the reasoning capabilities of ChatGPT. Starting with a large quantity of logical puzzles reformed in a way that can be presented to ChatGPT. The large language model then tries to present reasoned answers. No spoilers, but it doesn't go well.
Link to paper on ResearchGate here:
(PDF) Measuring reasoning capabilities of ChatGPTresearchgate.net
The paper goes the extra mile beyond just testing ChatGPT problem solving capabilities. The paper looks in detail at the dialogue and extracts the type of nature of the logical fallacies that are included in chat GPT's reasoning. There appears to be lack of common sense, better answers are presented as being worse answers. In some cases extra constraints are unnecessarily added to problems, leading to the inability to find the correctly available answer.
One conclusion of the paper correctly identifies that ChatGPT does have strengths in the language processing area, but falls very short when required to do detailed factual, correct problem-solving work.
The dangerous/great thing about "trusting an Oracle" without substantive background and showing the workings is that it can learn from feedback....
From the paper ... Preprint · October 2023
Run today 3 December 2023Good to see that ChatGPT can make a fair assessment of its own abilities on logical problems and puzzles.
After concluding that 693 is equal to 729 and a father aged 29 can have children aged 24 and 21 Chat GPT provided the admission that "For the most accurate and reliable results on logical puzzles, it's often better to use dedicated problem-solving tools or consult with experts in the relevant field."
*** Update Jan 2024 -- See also ChatGPT bombs test on diagnosing kids’ medical cases with 83% error rate
**** From the Source itself
Q: Can chatGPT solve all sudoku ?