Monday, April 17, 2023

Checking ChatPDF Accuracy

Scientist tests ChatPDF to see how well it "understands" the content:

For this test I'm going to take the five most recent papers on this blog and use the specialised ChatPDF. Let's see if this does better than the ChatGPT app and how it compares to doing to hard work of actually reading the paper...

Conclusions

I'm afraid this one can't be said to be anything beyond the usual "impressive tech demo" stage. It is categorically not ready for actual use and anyone paying the subscription fee is at this stage a complete fool. 

To give credit where credit is due, it does often produce remarkably good summaries that are more accessible than reading the abstracts. It can extract complex variables, even ones which are stated directly in the text. It seems to do better when you ask it very specific questions, but it's capable of handing complicated technical descriptions and distilling them down to their most relevant points in even more generalised ways.

The problem is that accuracy and usefulness do not scale linearly with each other. If it produces accurate statements 70, 80, even 90% of the time, it's only useful 0% of the time. Why ? Because that failure rate is such that its claims always have to be checked, and one would be better off just reading the paper. You have no idea if it's just making stuff up or missing a vital point. Worse, it's dangerously coherent. If you're not already an expert in the field, it produces statements which sound fully convincing but are in fact just plain wrong. I'm glad it references the parts of the text it's getting its information from, but it frequently just invents entire quotes, and that's unacceptable.