ColPali and Byaldi for reading PDFs with images, Reflection-70B

Multi-modal documents were always a problem, but what I can read now AI developers already made a huge progress, beating traditional PDF parsers. Now we have at least 3 solutions available as publicly available models. Day by day I’m surprised how fast everything goes; I can only read the news to not be left out of mainstream, but to have time for trying everything is impossible.

RAG is increasingly going multi-modal, but document retrieval is tough, and layout gets in your way. But it shouldn't!

Introducing 🪤RAGatouille's Vision-equipped, ColPali-powered sibling: 🐭Byaldi

With just a few lines of code, search through documents, with no pre-processing. pic.twitter.com/PmC5ALajss
— Benjamin Clavié (@bclavie) September 5, 2024

There is also a model which tries to corrects its own mistakes, I think it’s a new approach to that problem. Probably worth to trace the progress in this area also, because it may change a lot in solutions I use now.

I'm excited to announce Reflection 70B, the world’s top open-source model.

Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes.

405B coming next week – we expect it to be the best model in the world.

Built w/ @GlaiveAI.

Read on ⬇️: pic.twitter.com/kZPW1plJuo
— Matt Shumer (@mattshumer_) September 5, 2024

Leave a Reply Cancel reply