# Data


PollSignal uses public election data, candidate data, postcode lookup services, polling or forecast evidence where permitted, and manual editorial review. This page explains what data we use, how we process it, what we publish, and what we do not publish.

## Public data exports

When the site is live, we intend to publish public recommendation data in a simple format.

Planned exports:

| File | Purpose |
|---|---|
| `/data/current.csv` | Current published recommendations in CSV format. |
| `/data/current.json` | Current published recommendations in JSON format. |
| `/data/archive/` | Historic recommendation snapshots. |
| `/data/changelog/` | Notes on material changes to recommendations or methodology. |

The public export will include recommendation data, not private user data.

Example fields:

```txt
constituency_code
constituency_name
slug
recommendation_type
recommended_party
confidence
split_vote_risk
status
public_explanation
last_reviewed_at
published_at
manual_override
source_summary
```

## Main data sources

The exact source mix may change as better data becomes available. For the first Westminster-focused version, the expected sources are:

| Data type | Expected source | Format | Use |
|---|---|---:|---|
| Postcode to constituency lookup | [Postcodes.io](https://postcodes.io/docs/overview/) | JSON API | Identify the user’s Westminster constituency from a postcode. |
| Constituency names and codes | [ONS / data.gov.uk Westminster Parliamentary Constituencies names and codes](https://www.data.gov.uk/dataset/90293fcf-9268-4a1a-be3d-d4a20d314604/westminster-parliamentary-constituencies-july-2024-names-and-codes-in-the-uk-v2) | CSV / GeoJSON | Maintain the canonical list of constituencies and constituency codes. |
| Constituency boundaries | ONS Open Geography / data.gov.uk | GeoJSON / Shapefile | Boundary checks, maps, and future geospatial features. |
| General election results | [UK Parliament election results](https://electionresults.parliament.uk/general-elections) and House of Commons Library data | CSV | Previous result tables, winner, majority, turnout, and party vote shares. |
| Candidate data | [Democracy Club data downloads](https://democracyclub.org.uk/data_apis/data/) | CSV | Candidate names, parties, election IDs, and candidate profile data. |
| Current MPs and constituency representation | UK Parliament data | API / CSV | Sitting or previous MP, party, and constituency representation. |
| Polling and forecasts | Public or licensed polling and forecast sources | CSV / XLSX / JSON | Evidence for seat-level or national trends, where reuse is permitted. |
| Corrections | User-submitted reports | Form submissions | Review possible errors or outdated information. |
| Editorial recommendation | PollSignal editorial review | Admin record / CSV | Final tactical recommendation, confidence, explanation, and status. |

## How data is processed

We treat data processing as a controlled publishing workflow rather than a live feed that automatically changes recommendations.

```txt
External source
  -> raw file or API response stored
  -> staging import
  -> validation and normalisation
  -> diff against current data
  -> review queue where needed
  -> editorial approval
  -> published recommendation snapshot
```

## What can update automatically

Some factual data can be updated after validation:

- Constituency names and codes.
- Party name aliases.
- Historical election result tables.
- Candidate records imported from official or trusted datasets.
- Public source links.
- Last-import status.

Even where an import is automatic, significant changes can still be sent to a review queue.

## What is manually reviewed

The following are manually reviewed before publication:

- Tactical voting recommendation.
- Confidence rating.
- Split-vote risk rating.
- Candidate ideological or credibility labels.
- Manual overrides.
- Public explanation text.
- “Too close” or “no realistic path” decisions.
- Polling or forecast evidence that materially changes a seat.

Automated data can suggest that a seat needs review. It should not automatically publish a political recommendation.

## Postcode lookup

The postcode lookup is used to identify the user’s Westminster constituency.

Postcodes.io is expected to be the initial lookup service. It is a free, open-source UK postcode API and includes Westminster constituency information, including 2024 constituency fields. It does not require authentication.

A known limitation is boundary straddling: some postcode areas cross administrative or constituency boundaries. In those cases, a postcode-level lookup may assign the whole postcode to the area where its centroid falls. If a user believes the assigned constituency is wrong, they should use the correction form.

## Candidate data

Candidate data is expected to come from Democracy Club where available. Democracy Club provides candidate and election data as CSV downloads, with candidate names and parties drawn from official nomination papers and other profile information added by Democracy Club or volunteers.

Candidate data may change quickly during an election period. New or changed candidate records are reviewed before they affect public recommendations.

## Polling and forecast data

Polling and forecasts can help identify trends, but they are handled carefully.

We may use:

- National voting-intention polls.
- Constituency-level polling, where available.
- MRP or seat projection data, where reuse is permitted.
- Paid or licensed forecast data.
- Internal analyst spreadsheets.

Polling and forecast data is treated as evidence, not as an automatic decision. We store the publication date, source, fieldwork dates where available, methodology notes, and any licensing restrictions.

## Recommendation statuses

A constituency may have one of several internal or public statuses:

| Status | Meaning |
|---|---|
| **Recommendation confirmed** | The current recommendation has been reviewed and published. |
| **Needs review** | New data or a correction means the constituency should be checked. |
| **Awaiting candidate data** | Candidate information is incomplete or uncertain. |
| **Awaiting polling data** | More evidence is needed before a firm recommendation can be made. |
| **No recommendation possible** | A tactical recommendation cannot currently be made responsibly. |

## What we do not publish

Public data exports should not include:

- Individual email addresses.
- Raw postcodes submitted by users.
- Subscriber records.
- Internal editorial notes.
- Private candidate research notes.
- Admin user identities.
- Unreviewed forecast evidence.
- Personal data submitted through correction forms.

## Data privacy

For lookup, the site only needs the postcode long enough to identify the constituency. We do not need to publish or expose individual postcode lookups.

For email alerts, users may provide an email address and postcode or constituency so that updates can be tagged by constituency or region. Users should be able to unsubscribe at any time.

Political campaigning and political email communications may be subject to data protection and direct marketing rules. Our privacy policy should explain what data is collected, why it is collected, how long it is kept, and how users can withdraw consent.

## Licensing and attribution

We aim to use data sources that can be reused lawfully and attributed clearly. Some public datasets are available under the Open Government Licence or Open Parliament Licence. Polling and forecast data may have additional restrictions and may not be republished unless the licence permits it.

Where a source does not permit public redistribution, it may be used only as internal evidence or excluded.

## Corrections

Users can report suspected errors on candidate details, constituency data, source links, or recommendations. Accepted corrections are reviewed by editors and may result in updates to the constituency page and public data snapshot.

## Data limitations

Election data is never perfect. Known limitations include:

- Not every constituency has reliable current polling.
- Candidate information can change quickly after an election is called.
- Historical results may be affected by boundary changes.
- Forecasts may disagree with each other.
- Postcode lookup can be imperfect near boundaries.
- Local candidate strength is difficult to quantify.
- Tactical-voting behaviour depends on voter coordination, not just published data.

Where the data is weak or contested, we will show lower confidence or avoid making a firm recommendation.

