GTFS Libraries > TransitGPT
TransitGPT is a Generative AI-powered chatbot that enables transit enthusiasts to access and analyze General Transit Feed Specification (GTFS) data through natural language instructions.
TransitGPT 🚌
TransitGPT is a specialized chatbot that helps transit enthusiasts retrieve transit information and analyze GTFS feeds via code. Try the chatbot here.
🏗️ Architecture Overview
This diagram illustrates the high-level architecture of the TransitGPT system, showing how different components interact. The workflow consists of 4 key steps:
Moderation
- All queries are moderated
- Irrelevant queries are blocked
Main LLM
- Generates code response for the query of interest
Code Execution
- Code generated by the main LLM is executed in a safe environment
- Includes retry mechanism for failed executions
Summary
- Results are summarized in a chat-like response format
✨ Features
- Interactive chat interface for querying GTFS data
- Code generation and execution for GTFS analysis
- Support for multiple LLM models. Default models are:
Claude 3.5 Sonnet,Claude 3.5 Haiku,GPT-4o,GPT-4o-mini - Visualization of results using Matplotlib, Plotly, and Folium
- Feedback system for user interactions
- Support for multiple GTFS feeds
- Support for multiple visualization types including:
- Static/Interactive maps
- Static/Interactive plots
- Tables (DataFrames)
🛠️ Setup
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`Install the required dependencies:
pip install -r requirements.txtEnsure you have the necessary GTFS data files and update the
gtfs_data/file_mapping.jsonaccordingly:Add a New GTFS Feed:
- Place the GTFS File: Add the GTFS zip file to the appropriate directory within
gtfs_data/. - Update
file_mapping.json: Add a new entry for the transit agency in the following format:"New Transit": { "file_loc": "gtfs_data/New Transit Agency/gtfs.zip", "distance_unit": "m", "pickle_loc": "gtfs_data/feed_pickles/New_Transit_gtfs_loader.pkl" }
- Place the GTFS File: Add the GTFS zip file to the appropriate directory within
Generate pickled GTFS feeds for faster loading:
python utils/generate_feed_pickles.pySet up your environment variables for API keys and other sensitive information:
- Create a
.streamlit/secrets.tomlfile in your project directory. - Add your API keys in the following format:
[general] OPENAI_API_KEY = "your_openai_api_key" GROQ_API_KEY = "your_groq_api_key" ANTHROPIC_API_KEY = "your_anthropic_api_key" GMAP_API = "your_google_maps_api_key" - Ensure that this file is not included in version control by adding it to your
.gitignore.
- Create a
Run the Streamlit app:
streamlit run chat_app.py
[Alternative] 🐳 Docker Installation
As an alternative to the standard setup, you can use Docker to run TransitGPT:
Build the Docker image:
docker build -t transitgpt .Run the container:
docker run -p 8501:8501 \ -e OPENAI_API_KEY=your_openai_api_key \ -e GROQ_API_KEY=your_groq_api_key \ -e ANTHROPIC_API_KEY=your_anthropic_api_key \ -e GMAP_API=your_google_maps_api_key \ -v $(pwd)/gtfs_data:/app/gtfs_data \ transitgptAccess the application: Open your browser and go to http://localhost:8501
Notes for Docker Setup
- Make sure to replace the placeholder API keys with your actual keys
- The volume mount for
gtfs_dataensures your GTFS data persists between container restarts - If you need to add new GTFS feeds, add them to your local
gtfs_datadirectory and updatefile_mapping.jsonas described in the standard setup
📱 Usage
- Select an LLM model and GTFS feed from the sidebar
- Type your query in the chat input or select a sample question
- View the generated code, execution results, and visualizations
- Provide feedback on the responses
⚙️ Configuration
- LLM models available: Claude 3.5 Sonnet, GPT-4o, GPT-4o-mini, Llama 3.1 8B Instant
- Maximum chat history:
16messages - Timeout for code execution:
5minutes
📁 Project Structure
chat_app.py: Main Streamlit applicationcomponents/: UI components and interface setuputils/: Utility functions and helper methodsprompts/: LLM prompts and examplesdata/: Sample questions and few-shot examplesgtfs_data/: GTFS feed files and mappingsgtfs_agent/: GTFS data loading, processing, and LLM agentevaluator/: Code execution and evaluationtests/: Unit tests for various components
📄 Key Files
gtfs_agent/gtfs_loader.py: GTFS data loading and processinggtfs_agent/agent.py: LLM Agent implementationevaluator/eval_code.py: Code execution and evaluationutils/feedback.py: Feedback collection systemprompts/generate_prompt.py: Dynamic prompt generationutils/generate_feed_pickles.py: Generate pickled GTFS feedsutils/constants.py: Constant values used across the projectutils/helper.py: Helper functions for various tasksgtfs_agent/llm_client.py: LLM API clients for different models
⚠️ Disclaimer
This chatbot is an AI-powered tool designed to assist with GTFS data analysis and code generation. Please be aware of its limitations, verify critical information, and review generated code before use in production environments.
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Contributing Few-Shot Examples
Thank you for your interest in contributing to our few-shot examples! This guide will help you add new examples to our dataset, ensuring consistency and quality across all contributions.
Understand the Structure: Each example in the
data/few_shot.yamlanddata/few_shot_viz.yamlfiles follows a specific format. If you example generates a visualization, add it todata/few_shot_viz.yaml. If it does not, add it todata/few_shot.yaml.Use Clear and Descriptive Questions: Ensure that the question field clearly describes the task or query. It should be concise yet informative.
Provide Accurate Answers: The answer should be a valid Python code snippet that solves the question. Ensure the code is correct and follows best practices.
Include Additional Information: Where applicable, provide additional information that explains the context or any assumptions made in the answer.
Test Your Code: Before submitting, test your code to ensure it works as expected with the GTFS data.
Adding a New Example
Select the Appropriate File:
- Use
few_shot.yamlfor examples that do not involve visualization. - Use
few_shot_viz.yamlfor examples that include visualizations like plots or maps.
- Use
Follow the Example Template:
- Each example should have a unique identifier (e.g.,
example_XX). - Include the
feedandquestionfields. - Provide the
answeras a Python code block. - Add any
additional_infoif necessary.
- Each example should have a unique identifier (e.g.,
Example Template:
example_XX: feed: [Feed Name] question: [Your question here] answer: | ```python # Your Python code here ``` additional_info: [Optional additional information]Ensure Consistency:
- Use consistent naming conventions and formatting.
- Follow the existing style for comments and code structure.
Validate Your Contribution:
- Check for syntax errors and logical correctness.
- Ensure the example is unique and not a duplicate of existing examples.
Submit Your Contribution:
- Fork the repository and create a new branch for your contribution.
- Add your example to the appropriate file.
- Submit a pull request with a clear description of your changes.
Review Process
- Your contribution will be reviewed by the maintainers.
- Feedback may be provided for improvements or corrections.
- Once approved, your example will be merged into the main branch.
©️ Copyright
Copyright © 2024 Urban Traffic & Economics Lab (UTEL)
📚 Citation
If you use TransitGPT in your research, please cite our paper:
@misc{devunuri2024transitgpt,
title={TransitGPT: A Generative AI-based framework for interacting with GTFS data using Large Language Models},
author={Saipraneeth Devunuri and Lewis Lehe},
year={2024},
eprint={2412.06831},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.06831},
}