Table of Contents
Why is GPT 4 struggling to read PDFs? You need an ultimate "Chat With PDF" APP

Why is GPT 4 struggling to read PDFs? You need an ultimate "Chat With PDF" APP

Jan 31, 2024

Since November 2023, GPT-4 has enabled users to upload documents for targeted questioning. The context window has been expanded to 128k tokens, allowing the input of content equivalent to 300 pages of a book in a single instance.

With this announcement, speculation is rife that GPT-4 has emerged as a game-changer, outperforming existing GPT-based ecosystem reading software like ChatDOC and ChatPDF. It is now poised to become the ultimate Chat with PDF app in the market.

However, after several months have passed, is this indeed the reality?

Unfortunately, there have been numerous discussions on major forums where users frequently encounter errors and issues when using GPT-4 to read PDFs. For instance, when GPT-4 wrapper dealing with PDFs exceeding 10 pages, it will report multiple errors.

Now, let's delve into the technical perspective to explore the underlying reasons behind these errors, and why you might be in need of the ultimate "Chat With PDF" APP — ChatDOC.

1. OCR

A robust Optical Character Recognition (OCR) is essential, particularly one that excels in parsing tables and images. Currently, there is a lack of free or commercial OCR technology that performs this task effectively. Many business and research-oriented PDFs often contain intricate tables and images, making a high-quality OCR solution crucial.

ChatDOC excels in meeting this need as it enables the recognition of scanned content, including intricate tables. It effectively handles various table formats, such as tables with infinite cells, densely formatted layouts, and those with complex merged cells. This capability proves invaluable for reading and interpreting diverse content, such as financial reports and experimental findings.

1280X1280 (1).PNG

2. RAG

A straightforward Rapid Access Generator (RAG) could be implemented to segment, embed, retrieve results from documents exceeding 10 pages, and subsequently pass them to a Language Model (LLM). However, it's important to note that this feature is presently lacking in the majority of chatbots.
We conducted an empirical RAG experiment across hundreds of questions from the corresponding real-world professional documents. The results show that, ChatDOC, a RAG system equipped with a panoptic and pinpoint PDF parser, retrieves more accurate and complete segments, and thus better answers. Empirical experiments show that ChatDOC is superior to baseline on nearly 47% of questions, ties for 38% of cases, and falls short on only 15% of cases. It shows that we may revolutionize RAG with enhanced PDF structure recognition.

1280X1280.PNG

3. Highlighting Doc Sections

The optimal solution should ideally highlight the sections of the document from which the response is extracted. ChatGPT does not support this feature. However, this is indispensable when reading rigorous academic papers or financial reports. We need to ensure that every response is well-supported. ChatDOC's every response, backed by citations. Subtle footnotes can be traced back to the original content. So we can ensure the credibility of the responses.

0ae190e8-9aef-4bc9-8b80-ae11b85a3c5c.png

4. Files Limitation

Simultaneously reading multiple documents for analysis and summarization is also a common reading scenario for knowledge workers. Unfortunately, ChatGPT doesn't do a good job, its document upload limit is 10, as mentioned at the beginning of the article. Given ChatGPT's challenges in handling PDFs exceeding 10 pages, the implications for reading multiple documents can be anticipated. This is a recurring issue discussed across various forums.
In contrast, ChatDOC accommodates an unlimited number of uploaded files, allowing for the processing of more information with increased efficiency. Our tests indicate that optimal results were consistently achieved within 30 files.

d3297563-46be-4422-9f1b-75b437b50b32.png

Blog Card Background

Read professional documents faster than ever.

Get serious and accurate results with ChatDOC, your professional-grade PDF Chat AI.

Try for Free