Tags and Categories Categories dataset genesis infrastucture milestone mood paperworking reading-notes story time Tags accuracy CATMuS cotutelle courses CREMMA datasets eScriptorium evaluation experiment guidelines health insurance house cleaning HTR init kraken Large Language Models manifesto metrics OCR software documentation static website synthetic data tuition visa wikicremma