PandaAI 3.0 is currently in beta. This documentation reflects the latest features and functionality, which may evolve before the final release.

Installation

PandaAI requires Python 3.8+ <3.12. We recommend using Poetry for dependency management:

# Using poetry (recommended)
poetry add "pandasai>=3.0.0b2"

# Alternative: using pip
pip install "pandasai>=3.0.0b2"

Quick setup

In order to use PandaAI, you need a large language model (LLM). While you can use any LLM, for the purpose of this guide, we are using BambooLLM. You can get your free API key signing up at app.pandabi.ai, which allows you to both use the data platform and get BambooLLM credits.

First, import PandaAI and set up your API key:

import pandasai as pai

# Get your API key from https://app.pandabi.ai
pai.api_key.set("YOUR_PANDABI_API_KEY")

Chat with your data

import pandasai as pai

# Load your data
df = pai.read_csv("data/companies.csv")

response = df.chat("What is the average revenue by region?")
print(response)

When you ask a question, PandaAI will use the LLM to generate the answer and output a response. Depending on your question, it can return different kind of responses:

  • string
  • dataframe
  • chart
  • number

Find it more about output data formats here.

Creating your first data layer

1. Define a data source

Start by creating a data schema that describes your dataset:

import pandasai as pai

# Load your data
df = pai.read_csv("data/companies.csv")

# Create the data layer
companies = pai.create(
  path="my-org/companies",
  df=df,
  name="companies",
  description="Customer companies dataset"
)

This dataset will be saved in the datasets/my-org/companies folder of your project.

2. Define the structure of your dataset

By default, the column will be inferred from the data. For more control, though, you can define explicit column schemas:

# Define a companies dataset with explicit schema
companies = pai.create(
  path="my-org/companies",
  df=df,
  name="companies",
  description="Customer companies dataset",
  columns=[
    {
      "name": "company_name",
      "type": "string",
      "description": "The name of the company"
    },
    {
      "name": "revenue",
      "type": "float",
      "description": "The revenue of the company"
    },
    {
      "name": "region",
      "type": "string",
      "description": "The region of the company"
    }
  ]
)

3. Load and query data

Once defined, you can easily load and query your datasets:

# Load existing datasets
stocks = pai.load("organization/coca_cola_stock")
companies = pai.load("organization/companies")

# Query using natural language
result = companies.chat("What's the average revenue by region?")

Sharing and collaboration

Share your data layers with your team:

# Push datasets to the platform
companies.push()
market.push()

Team members can then access and query the shared datasets through:

  • The web interface at app.pandabi.ai
  • Their own PandaAI code using pai.load("organization/dataset-name")

Of course, they will only be able to see the datasets they have access to. You can control access using the permission management features.

Next Steps