What is Data Concierge?
Data Concierge is an AI-powered chat assistant that lets you explore government and public open-data sources using plain English. Instead of downloading CSV files and writing code yourself, you just ask a question and the system:
Searches registered CKAN portals and connected data sources automatically.
Runs real Python code against the live dataset to compute counts, trends and summaries.
Returns a clear, cited, Markdown-formatted answer with key takeaways.
Every answer comes with a downloadable Jupyter notebook you can run and modify.
Signing Up & Logging In
A login is required to run new queries. Verified community notebooks can be browsed and downloaded without any account.
Password Login
Social Login (Auth0)
If your organisation uses Auth0-based social login the Sign in with Social Login button will appear in the login modal. Currently, GitHub is the supported social login provider.
Guest Access
Without logging in you can still:
- Browse the landing page and example questions.
- View and download any Verified notebook from the community library.
- See answers that match a verified notebook (the system will serve the cached answer automatically).
To submit your own queries and generate new analysis you must be logged in.
Asking Questions
Type your question in natural language into the search bar on the landing page or the message box at the bottom of any open chat. Press Enter or click the button to submit.
The system works best when questions are:
- Specific about geography — mention the city, neighborhood, or county you care about.
- Focused on a single metric — "How many potholes were reported in Squirrel Hill last year?" works better than "Tell me everything about Pittsburgh infrastructure."
- Time-aware — include a year or date range where relevant ("in 2024", "last 6 months").
Example Questions
Here are the kinds of questions Data Concierge handles well:
311 Service Requests
- What are the most common types of 311 requests in Pittsburgh?
- Which neighborhoods generate the most 311 complaints?
- How long does it take the city to resolve pothole reports?
- Show me 311 request trends over the past three years.
Public Safety
- Which Pittsburgh neighborhoods have the most police incidents?
- What types of crime are most common in the Hill District?
- Show me year-over-year changes in reported incidents by type.
Building & Property
- How many building permits were issued in Pittsburgh last year?
- What is the distribution of property values across Pittsburgh neighborhoods?
- Which ZIP codes have the highest number of vacant properties?
Community & Demographics
- Show me community center attendance trends in Pittsburgh.
- What is the population breakdown by age group in Allegheny County?
- What is the median household income by neighborhood?
Data Sources
Data Concierge currently queries the following data sources:
| Source | Type | Coverage |
|---|---|---|
| WPRDC — Western PA Regional Data Center | CKAN | Pittsburgh & Allegheny County — 311 requests, crime, permits, property assessments, community centres, traffic, and more (126+ datasets) |
| datHere CKAN Portal | CKAN | General open data portal with diverse datasets |
| U.S. Census Bureau MCP | MCP | National & state-level population, income, housing, demographics (ACS) |
Admins can register additional CKAN portals from the Admin Panel — see the CKAN Sites section below.
Verified Answers
When an admin approves a notebook submission it enters the Verified Library. Future questions that closely match a verified entry (≥ 50 % semantic similarity) receive the pre-reviewed answer instantly — no new computation needed.
If a verified notebook scores between 40–49 % similarity, the system shows a Similar Verified Notebook banner beneath the generated answer so you can quickly download the related community notebook.
Managing Conversations
- New Chat — click the blue New Chat button in the sidebar to start a fresh conversation.
- Switching chats — click any conversation in the left sidebar to jump back to it; all messages are restored.
- Deleting a chat — click the icon on any sidebar item.
- Persistent URLs — each conversation gets a unique URL (e.g.
/#a3f9b2c1). Bookmark it or share the link to return directly to that chat. - Cross-device sync — when logged in your chats are saved to the server and load automatically on any browser.
What is a Notebook?
Every answer generated by the AI is accompanied by a Jupyter Notebook
(.ipynb) — a self-contained document that records exactly how the data was
fetched, cleaned, and analysed. Notebooks consist of two cell types:
Executable Python — imports libraries, calls the CKAN API, computes statistics, creates charts.
Human-readable explanation — describes what the code does and interprets results.
Notebooks make analysis fully reproducible: anyone can download the file, run it in Jupyter or Google Colab, and arrive at the same numbers.
Viewing & Downloading
Submitting for Review
If you receive a high-quality generated answer you think others would benefit from, you can submit the notebook for admin review. Once approved it enters the Verified Library and future similar questions receive an instant, trusted answer.
Running in Jupyter / Google Colab
Jupyter Lab / Notebook
.ipynb file from the viewer.pip install requests pandas if not already installed.jupyter lab and open the downloaded file.Google Colab
.ipynb.Admin Panel — Overview
/admin or click Admin in the navbar when logged in as an admin.
The admin panel is organised into tabs across the top:
| Tab | Purpose |
|---|---|
| Pending Review | Notebook submissions waiting for approval or rejection. |
| All Submissions | Full history of all submissions with status filter. |
| Verified Library | Browse and search all approved notebooks. |
| Search | Semantic search against the verified library — useful for testing similarity. |
| Approved Members | Manage the email allowlist for Auth0 social login access. |
| Admins | Grant or revoke admin roles for other users. |
| Query Logs | Per-user query history with source, confidence and timing. |
| MCP Servers | Register and connect Model Context Protocol data servers. |
| CKAN Sites | Register additional CKAN open-data portals. |
| Settings | Configure GitHub notebook publishing. |
Reviewing Submissions
When a user submits a notebook for review it appears in the Pending Review tab with a yellow badge showing the count. The stats cards at the top of the page also update.
Approved Members
When Auth0 social login is enabled, only emails on the Approved Members list can gain access. Users who sign in but aren't approved land on a "Awaiting Verification" page and a pending access request is generated automatically.
- Add an email — enter the email address in the form and click Approve. The user can sign in immediately.
- Pending requests — shown at the top of the tab when there are users waiting; click Approve or Dismiss directly from there.
- Remove access — click next to any approved email to revoke login access.
CKAN Sites
Any CKAN-compatible open-data portal can be added as a data source. The agent will include it when searching for datasets — no restart required.
/api/3 (e.g. https://data.example.gov).MCP Servers
Model Context Protocol (MCP) servers extend Data Concierge with additional data capabilities beyond CKAN — such as the U.S. Census Bureau server that provides demographic and geographic lookup tools.
Two transport types are supported:
- stdio (Process) — launches a local process (e.g. a Docker container or Node.js script). Requires a Command, optional Arguments, Environment Variables, and a Working Directory.
- SSE (HTTP) — connects to a remote server via HTTP Server-Sent Events. Requires just a Server URL.
Once added, click Connect on a server card to discover its available tools. Connected tools appear as chips and can be invoked directly from the admin panel for testing.
Query Logs
The Query Logs tab records every question asked by logged-in users, whether it was served from the verified cache or freshly generated.
- User filter — narrow to a specific user's history.
- Source filter — filter by Generated, Verified cache, or Errors.
- Refresh — click the refresh button to pull the latest entries.
Each row shows the timestamp, user, query text, source, confidence score, and response time in milliseconds.
GitHub Notebook Publishing
Data Concierge can automatically push approved notebooks to a GitHub repository, keeping a public or private archive of all verified community analysis.
repo write scope. Leave blank to keep the existing token.owner/repo format, the Branch, and separate folder paths for draft and verified notebooks.Tips & Tricks
- Rephrase if the answer is wrong — try adding more context ("in the Hill District", "from 2022 to 2024") or simplifying the question.
- Use example questions as templates — click any of the pre-loaded examples on the landing page to see the format that works best.
- Bookmark your chat URL — each conversation has a unique hash URL; bookmark it to return directly.
- Download the notebook — even if you don't run it, the notebook shows the exact CKAN API calls used, which you can replicate manually.
- Check the confidence score — shown below each assistant message. Lower scores mean the model was less certain; treat those answers with extra scepticism.
FAQ
Why does the assistant say it can't find data?
The agent searches registered CKAN portals and connected MCP servers. If no relevant dataset is found it will say so honestly. Try rephrasing your question or ask an admin to register an additional data portal.
How recent is the data?
The assistant fetches data live from CKAN portals at query time, so results are as fresh as the portal's own update schedule. Most Pittsburgh/WPRDC datasets are updated daily to monthly.
Can I ask follow-up questions?
Yes — within the same conversation, continue typing follow-up questions. The assistant maintains context across messages in a single chat session.
Is my query logged?
When logged in, your queries are recorded in the server-side query log visible to admins. This helps improve the system and build the verified notebook library.
Why is a verified answer shown for a slightly different question?
The system uses semantic similarity to match questions — not exact text matching. If your question is at least 50 % similar to a verified query, the cached answer is served. You can still ask for a fresh analysis by rephrasing to be more specific.
Who do I contact for access issues?
Contact your organisation's Data Concierge administrator. If you used social login and are waiting for approval, your admin was notified automatically.