Feb 6, 2025
Hey Abdelrahman, I may be too late joining the conversation but here’s what you can try and it actually really worked in my case, since i have to deal with longer pdfs like 400-1000 pages.
We did convert each and every page of a pdf as an image and passed it to vision models like GPT-4o, and it was really good in detecting images, tables and text. You have to play around with the prompt.