Multi-modal documents were always a problem, but what I can read now AI developers already made a huge progress, beating traditional PDF parsers. Now we have at least 3 solutions available as publicly available models. Day by day I’m surprised how fast everything goes; I can only read the news to not be left out of mainstream, but to have time for trying everything is impossible.
There is also a model which tries to corrects its own mistakes, I think it’s a new approach to that problem. Probably worth to trace the progress in this area also, because it may change a lot in solutions I use now.