Tags and Categories Categories dataset genesis infrastucture milestone mood paperworking reading-notes story time writing Tags accuracy CATMuS cotutelle courses CREMMA datasets eScriptorium evaluation experiment guidelines health insurance house cleaning HTR init kraken Large Language Models manifesto markdown metrics OCR quarto software documentation static website synthetic data tuition visa wikicremma