Health Literacy Algorithm

How it works

The health literacy algorithm is a machine learning tool that classifies and analyses BRCA-related content. Here’s how the pipeline will work:

  • Trained on the multilingual health literacy dictionary (Task 3.3), which pulls together culturally relevant terms from surveys, interviews, and health platforms across the three partner countries.
  • Uses NLP techniques, specifically TF-IDF (term frequency-inverse document frequency), to process and classify content across four languages.
  • Sorts content into themes like prevention, emotional support, treatment options and information needs.
  • Spotting patterns: for example, if lots of people are asking confused questions about genetic testing, or if there’s widespread misunderstanding about what prophylactic mastectomy involves.
  • Once connected through the API, it can analyse patient interactions on hospital websites and health apps in real time.

The algorithm will be released as open-source software on this website, with full documentation and source code, expected by late 2027.

Assessment scales

The algorithm will use specific scales to classify the health literacy level of different content and interactions.

Details on the assessment methodology will be published here once the algorithm development is complete, expected by late 2027.

Examples of outputs

Sample reports and visualizations showing what the algorithm produces will be published here once it has been tested, expected by late 2027.