Model transparency and methodology
Relevance comes from TF-IDF cosine similarity between your selected topics and the contract description. Citizen impact is a composite of four public-interest signals:
When you select topics like "healthcare" or "defense," we expand each into a vocabulary of 8-12 seed keywords. For example, healthcare expands to: health, medical, hospital, patient, clinical, disease, nursing, medicaid, medicare, veterans.
These keywords are combined into a query and vectorized using TF-IDF (term frequency-inverse document frequency) with up to 15,000 features and bigrams. We then compute cosine similarity between your query vector and every contract description in the corpus.
The similarity score is blended with citizen impact using the alpha weight above. At alpha = 0.7, topic relevance dominates, but high-impact contracts get a meaningful boost.
A linear ranker trained on the same pairwise data as the neural model. These weights show which structural features predict DOGE scrutiny:
Topic match is the dominant signal (+0.21), confirming that personalization drives the ranking. Structural features like contract value and embedding similarity contribute minimally, suggesting the DOGE scrutiny relationship is primarily topic-driven.
We compared the classical TF-IDF model against a deep learning neural ranker (MLP trained with pairwise margin loss) across five user personas.
| Metric | Classical (TF-IDF) | Deep Learning (MLP) |
|---|---|---|
| Ranking method | TF-IDF + citizen impact | Sentence Transformer + MLP |
| Mean DOGE scrutiny | 0.703 | 0.399 |
| Mean contract value | $54.8M | $86.9M |
| Unique topics per persona | 3.2 | 1.4 |
| Jaccard similarity (top-20) | 0.052 | |
Jaccard of 0.052 means the two models share almost no recommendations. They are complementary: the classical model surfaces high-scrutiny items, the DL model surfaces high-value items.
MLP vs Linear Baseline (pairwise accuracy)
| Tier | Description | MLP | Linear |
|---|---|---|---|
| Tier 1 | Easy pairs | 0.998 | 1.000 |
| Tier 2 | Within-topic | 0.585 | 0.712 |
| Tier 3 | Off-topic | 0.584 | 0.695 |
| Overall | 0.722 | 0.811 |
The linear baseline outperforms the MLP on all tiers. The feature-scrutiny relationship is approximately linear. The MLP's additional capacity leads to overfitting rather than better discrimination.
Sources
Temporal range: January through October 2025. Geographic coverage spans all 50 states, with 39% of items having place-of-performance state data.
The conversational assistant ("Owl") uses a two-model approach. Data from the dashboard is passed as context to ground responses in real numbers.
Both models receive the current dashboard context (selected topic, contract stats, filters) as a structured prefix. This grounds the AI in real data and reduces hallucination. The AI cannot access data outside what the dashboard has already loaded.