How CivicLenses Recommends

Model transparency and methodology

Scoring Formula

final_score = 0.7 × relevance + 0.3 × citizen_impact

Relevance comes from TF-IDF cosine similarity between your selected topics and the contract description. Citizen impact is a composite of four public-interest signals:

30%

20%

GDELT news attention DOGE scrutiny 1 - transparency log(contract value)

TF-IDF Pipeline

When you select topics like "healthcare" or "defense," we expand each into a vocabulary of 8-12 seed keywords. For example, healthcare expands to: health, medical, hospital, patient, clinical, disease, nursing, medicaid, medicare, veterans.

These keywords are combined into a query and vectorized using TF-IDF (term frequency-inverse document frequency) with up to 15,000 features and bigrams. We then compute cosine similarity between your query vector and every contract description in the corpus.

The similarity score is blended with citizen impact using the alpha weight above. At alpha = 0.7, topic relevance dominates, but high-impact contracts get a meaningful boost.

Feature Weights (Linear Baseline)

A linear ranker trained on the same pairwise data as the neural model. These weights show which structural features predict DOGE scrutiny:

topic_match

+0.2096

norm_desc_length

+0.0523

transparency_score

+0.0314

embedding_similarity

-0.0200

log_value

-0.0185

bias

-0.0000

Topic match is the dominant signal (+0.21), confirming that personalization drives the ranking. Structural features like contract value and embedding similarity contribute minimally, suggesting the DOGE scrutiny relationship is primarily topic-driven.

Experiment Results

We compared the classical TF-IDF model against a deep learning neural ranker (MLP trained with pairwise margin loss) across five user personas.

Metric	Classical (TF-IDF)	Deep Learning (MLP)
Ranking method	TF-IDF + citizen impact	Sentence Transformer + MLP
Mean DOGE scrutiny	0.703	0.399
Mean contract value	$54.8M	$86.9M
Unique topics per persona	3.2	1.4
Jaccard similarity (top-20)	0.052

Jaccard of 0.052 means the two models share almost no recommendations. They are complementary: the classical model surfaces high-scrutiny items, the DL model surfaces high-value items.

MLP vs Linear Baseline (pairwise accuracy)

Tier	Description	MLP	Linear
Tier 1	Easy pairs	0.998	1.000
Tier 2	Within-topic	0.585	0.712
Tier 3	Off-topic	0.584	0.695
	Overall	0.722	0.811

The linear baseline outperforms the MLP on all tiers. The feature-scrutiny relationship is approximately linear. The MLP's additional capacity leads to overfitting rather than better discrimination.

Why classical TF-IDF is deployed: The team's evaluation found that the DL model achieves stronger precision and topic coverage through semantic matching (Sentence Transformer embeddings capture meaning beyond keyword overlap). However, the classical model was selected for production because a civic transparency tool benefits from explainable recommendations: reason strings, flag annotations, and direct incorporation of public-interest signals through the citizen impact score. The two models are complementary (5% overlap) and a future ensemble could combine both strengths.

Note on reproducibility: These results correspond to the saved model artifacts used by the deployed app. The NeurIPS paper reports results from a separate training run with different convergence behavior. The core conclusions hold across both runs: the linear baseline outperforms the MLP. This page reflects the deployed model.

Data Coverage

28,267Items tracked

$272BTotal value

2,964DOGE-flagged

39%With state data

Sources

DOGE.gov - cancelled contracts, grants, and leases with scrutiny scores
USAspending.gov - federal award details, agencies, and values
GDELT - news article counts for media attention signals
SAM.gov - entity registration and contract metadata

Temporal range: January through October 2025. Geographic coverage spans all 50 states, with 39% of items having place-of-performance state data.

Owl AI Architecture

The conversational assistant ("Owl") uses a two-model approach. Data from the dashboard is passed as context to ground responses in real numbers.

LFM2.5-350M In-browser perception model. Runs locally via ONNX Runtime. Handles lightweight tasks without network calls.

GPT-4o Authenticated reasoning model. Used for complex questions. Requires GitHub sign-in to access the free GitHub Models API quota.

Both models receive the current dashboard context (selected topic, contract stats, filters) as a structured prefix. This grounds the AI in real data and reduces hallucination. The AI cannot access data outside what the dashboard has already loaded.