--

Hey Abdelrahman, I may be too late joining the conversation but here’s what you can try and it actually really worked in my case, since i have to deal with longer pdfs like 400-1000 pages.

We did convert each and every page of a pdf as an image and passed it to vision models like GPT-4o, and it was really good in detecting images, tables and text. You have to play around with the prompt.

--

--

Shobhit Agarwal
Shobhit Agarwal

Written by Shobhit Agarwal

🚀 Data Scientist | AI & ML | R&D 🤖 Generative AI | LLMs | Computer Vision ⚡ Deep Learning | Python 🔗 Let’s Connect: topmate.io/shobhit_agarwal

No responses yet