What is Data Concierge?

Data Concierge is an AI-powered chat assistant that lets you explore government and public open-data sources using plain English. Instead of downloading CSV files and writing code yourself, you just ask a question and the system:

Finds the data
Searches registered CKAN portals and connected data sources automatically.
Analyses it
Runs real Python code against the live dataset to compute counts, trends and summaries.
Explains the result
Returns a clear, cited, Markdown-formatted answer with key takeaways.
Gives you the code
Every answer comes with a downloadable Jupyter notebook you can run and modify.

Signing Up & Logging In

A login is required to run new queries. Verified community notebooks can be browsed and downloaded without any account.

Password Login

1
Click the Login button in the top-right corner of any page.
2
Enter the username and password provided by your administrator.
3
Click Login. You'll see a green "Logged in" indicator appear in the navbar.

Social Login (Auth0)

If your organisation uses Auth0-based social login the Sign in with Social Login button will appear in the login modal. Currently, GitHub is the supported social login provider.

1
Click Sign in with Social Login and complete authentication with your chosen provider.
2
If your email is not yet on the approved list you will land on a "Awaiting Verification" page. An admin will be notified automatically — access is typically granted within one business day.
3
Once approved, sign in again and you'll proceed straight to the chat.
Your conversations are saved to your account and sync across browsers once you're logged in.

Guest Access

Without logging in you can still:

To submit your own queries and generate new analysis you must be logged in.

Asking Questions

Type your question in natural language into the search bar on the landing page or the message box at the bottom of any open chat. Press Enter or click the button to submit.

The system works best when questions are:

If the assistant returns a Verified Answer it means a community-reviewed notebook already answered this question — the result is instant and 100% reproducible.

Example Questions

Here are the kinds of questions Data Concierge handles well:

311 Service Requests

Public Safety

Building & Property

Community & Demographics

Data Sources

Data Concierge currently queries the following data sources:

Source Type Coverage
WPRDC — Western PA Regional Data Center CKAN Pittsburgh & Allegheny County — 311 requests, crime, permits, property assessments, community centres, traffic, and more (126+ datasets)
datHere CKAN Portal CKAN General open data portal with diverse datasets
U.S. Census Bureau MCP MCP National & state-level population, income, housing, demographics (ACS)

Admins can register additional CKAN portals from the Admin Panel — see the CKAN Sites section below.

Verified Answers

When an admin approves a notebook submission it enters the Verified Library. Future questions that closely match a verified entry (≥ 50 % semantic similarity) receive the pre-reviewed answer instantly — no new computation needed.

A Verified Answer badge means the answer was produced by a human-reviewed notebook and is considered authoritative. A Generated response is freshly computed by the AI and may contain errors — always check the underlying notebook.

If a verified notebook scores between 40–49 % similarity, the system shows a Similar Verified Notebook banner beneath the generated answer so you can quickly download the related community notebook.

Managing Conversations

What is a Notebook?

Every answer generated by the AI is accompanied by a Jupyter Notebook (.ipynb) — a self-contained document that records exactly how the data was fetched, cleaned, and analysed. Notebooks consist of two cell types:

Code cell

Executable Python — imports libraries, calls the CKAN API, computes statistics, creates charts.

Markdown cell

Human-readable explanation — describes what the code does and interprets results.

Notebooks make analysis fully reproducible: anyone can download the file, run it in Jupyter or Google Colab, and arrive at the same numbers.

Viewing & Downloading

1
After the assistant responds, click the View Notebook button that appears under the message.
2
A full-screen viewer opens showing every code and markdown cell. Click any cell header to collapse/expand it. Use Collapse All / Expand All to manage long notebooks.
3
Click Download .ipynb to save the notebook to your computer.

Submitting for Review

If you receive a high-quality generated answer you think others would benefit from, you can submit the notebook for admin review. Once approved it enters the Verified Library and future similar questions receive an instant, trusted answer.

1
Open the notebook viewer (click View Notebook on any generated response).
2
Click Submit for Review in the modal footer.
3
An admin will review the submission, optionally add notes, and approve or reject it. You'll see it in the sidebar as Pending until a decision is made.
Before submitting, verify the answer looks correct and the code runs without errors. Admins can view the full notebook and answer preview before approving.

Running in Jupyter / Google Colab

Jupyter Lab / Notebook

1
Download the .ipynb file from the viewer.
2
Open a terminal and run pip install requests pandas if not already installed.
3
Launch Jupyter: jupyter lab and open the downloaded file.
4
Run all cells with Shift+Enter or Run → Run All Cells.

Google Colab

1
2
Click File → Upload notebook and choose the downloaded .ipynb.
3
Click Runtime → Run all.

Admin Panel — Overview

The admin panel is only accessible to users with the admin role. Visit /admin or click Admin in the navbar when logged in as an admin.

The admin panel is organised into tabs across the top:

TabPurpose
Pending ReviewNotebook submissions waiting for approval or rejection.
All SubmissionsFull history of all submissions with status filter.
Verified LibraryBrowse and search all approved notebooks.
SearchSemantic search against the verified library — useful for testing similarity.
Approved MembersManage the email allowlist for Auth0 social login access.
AdminsGrant or revoke admin roles for other users.
Query LogsPer-user query history with source, confidence and timing.
MCP ServersRegister and connect Model Context Protocol data servers.
CKAN SitesRegister additional CKAN open-data portals.
SettingsConfigure GitHub notebook publishing.

Reviewing Submissions

When a user submits a notebook for review it appears in the Pending Review tab with a yellow badge showing the count. The stats cards at the top of the page also update.

1
Click Review on any pending submission card to open the review modal.
2
The modal shows the original query, the full answer preview (Markdown rendered), and a complete notebook cell-by-cell preview with syntax-highlighted code.
3
Optionally add Admin Notes — these are stored with the submission record.
4
Click Approve to add it to the Verified Library, or Reject to discard it. Both actions are permanent.
Before approving, make sure the code in the notebook correctly retrieves and analyses real data, the answer is factually accurate, and there are no hardcoded credentials.

Approved Members

When Auth0 social login is enabled, only emails on the Approved Members list can gain access. Users who sign in but aren't approved land on a "Awaiting Verification" page and a pending access request is generated automatically.

CKAN Sites

Any CKAN-compatible open-data portal can be added as a data source. The agent will include it when searching for datasets — no restart required.

1
Click Add Site.
2
Enter the Portal Name (display label) and the CKAN Portal URL — the base URL without /api/3 (e.g. https://data.example.gov).
3
Optionally set a Site ID (auto-generated if blank), Default Organization slug, Description, Keywords, and a Quality Score (0–1, higher = preferred).
4
Click Add Site. The portal is available immediately for new queries.
Add relevant Keywords (e.g. "nyc, new york, housing") so the agent automatically selects this portal when users ask about those topics.

MCP Servers

Model Context Protocol (MCP) servers extend Data Concierge with additional data capabilities beyond CKAN — such as the U.S. Census Bureau server that provides demographic and geographic lookup tools.

Two transport types are supported:

Once added, click Connect on a server card to discover its available tools. Connected tools appear as chips and can be invoked directly from the admin panel for testing.

Query Logs

The Query Logs tab records every question asked by logged-in users, whether it was served from the verified cache or freshly generated.

Each row shows the timestamp, user, query text, source, confidence score, and response time in milliseconds.

GitHub Notebook Publishing

Data Concierge can automatically push approved notebooks to a GitHub repository, keeping a public or private archive of all verified community analysis.

1
Go to SettingsGitHub Notebook Publishing.
2
Toggle Enable GitHub publishing on.
3
Paste a GitHub Personal Access Token (PAT) with repo write scope. Leave blank to keep the existing token.
4
Enter the Repository in owner/repo format, the Branch, and separate folder paths for draft and verified notebooks.
5
Click Save Settings, then Test Connection to verify everything is working.

Tips & Tricks

FAQ

Why does the assistant say it can't find data?

The agent searches registered CKAN portals and connected MCP servers. If no relevant dataset is found it will say so honestly. Try rephrasing your question or ask an admin to register an additional data portal.

How recent is the data?

The assistant fetches data live from CKAN portals at query time, so results are as fresh as the portal's own update schedule. Most Pittsburgh/WPRDC datasets are updated daily to monthly.

Can I ask follow-up questions?

Yes — within the same conversation, continue typing follow-up questions. The assistant maintains context across messages in a single chat session.

Is my query logged?

When logged in, your queries are recorded in the server-side query log visible to admins. This helps improve the system and build the verified notebook library.

Why is a verified answer shown for a slightly different question?

The system uses semantic similarity to match questions — not exact text matching. If your question is at least 50 % similar to a verified query, the cached answer is served. You can still ask for a fresh analysis by rephrasing to be more specific.

Who do I contact for access issues?

Contact your organisation's Data Concierge administrator. If you used social login and are waiting for approval, your admin was notified automatically.