Getting started with the Library
Get started with PandaAI by installing it and using the SmartDataframe class.
Installation
To use pandasai
, first install it:
Before installation, we recommend you create a virtual environment using your preferred choice of environment manager e.g Poetry, Pipenv, Conda, Virtualenv, Venv etc.
Optional dependencies
In order to keep the installation size small, pandasai
does not include all the dependencies that it supports by default. You can install the extra dependencies by running the following command:
You can replace extra-dependency-name
with any of the following:
google-ai
: this extra dependency is required if you want to use Google PaLM as a language model.google-sheet
: this extra dependency is required if you want to use Google Sheets as a data source.excel
: this extra dependency is required if you want to use Excel files as a data source.modin
: this extra dependency is required if you want to use Modin dataframes as a data source.polars
: this extra dependency is required if you want to use Polars dataframes as a data source.langchain
: this extra dependency is required if you want to support the LangChain LLMs.numpy
: this extra dependency is required if you want to support numpy.ggplot
: this extra dependency is required if you want to support ggplot for plotting.seaborn
: this extra dependency is required if you want to support seaborn for plotting.plotly
: this extra dependency is required if you want to support plotly for plotting.statsmodels
: this extra dependency is required if you want to support statsmodels.scikit-learn
: this extra dependency is required if you want to support scikit-learn.streamlit
: this extra dependency is required if you want to support streamlit.ibm-watsonx-ai
: this extra dependency is required if you want to use IBM watsonx.ai as a language model
SmartDataframe
The SmartDataframe
class is the main class of pandasai
. It is used to interact with a single dataframe. Below is a simple example to get started with pandasai
.
If you want to learn more about the SmartDataframe
class, check out this video:
How to generate a BambooLLM API Token
In order to use BambooLLM, you need to generate an API token. Follow these simple steps to generate a token with PandaBI:
- Go to https://pandabi.ai and signup with your email address or connect your Google Account.
- Go to the API section on the settings page.
- Select Create new API key.
How to generate an OpenAI API Token
In order to use the OpenAI language model, users are required to generate a token. Follow these simple steps to generate a token with openai:
- Go to https://openai.com/api/ and signup with your email address or connect your Google Account.
- Go to View API Keys on left side of your Personal Account Settings.
- Select Create new Secret key.
The API access to OPENAI is a paid service. You have to set up billing. Make sure you read the Pricing information before experimenting.
Passing name and description for a dataframe
Sometimes, in order to help the LLM to work better, you might want to pass a name and a description of the dataframe. You can do this as follows:
SmartDatalake
PandaAI also supports queries with multiple dataframes. To perform such queries, you can use a SmartDatalake
instead of a SmartDataframe
.
Similarly to a SmartDataframe
, you can instantiate a SmartDatalake
as follows:
PandaAI will automatically figure out which dataframe or dataframes are relevant to the query and will use only those dataframes to answer the query.
Agent
While a SmartDataframe
or a SmartDatalake
can be used to answer a single query and are meant to be used in a single session and for exploratory data analysis, an agent can be used for multi-turn conversations.
To instantiate an agent, you can use the following code:
Contrary to a SmartDataframe
or a SmartDatalake
, an agent will keep track of the state of the conversation and will be able to answer multi-turn conversations. For example:
Clarification questions
An agent will also be able to ask clarification questions if it does not have enough information to answer the query. For example:
this will return up to 3 clarification questions that the agent can ask the user to get more information to answer the query.
Explanation
An agent will also be able to explain the answer given to the user. For example:
Rephrase Question
Rephrase question to get accurate and comprehensive response from the model. For example:
Config
To customize PandaAI’s SmartDataframe
, you can either pass a config
object with specific settings upon instantiation or modify the pandasai.json
file in your project’s root. The latter serves as the default configuration but can be overridden by directly specifying settings in the config
object at creation. This approach ensures flexibility and precision in how PandaAI handles your data.
Settings:
llm
: the LLM to use. You can pass an instance of an LLM or the name of an LLM. You can use one of the LLMs supported. You can find more information about LLMs heresave_logs
: whether to save the logs of the LLM. Defaults toTrue
. You will find the logs in thepandasai.log
file in the root of your project.verbose
: whether to print the logs in the console as PandaAI is executed. Defaults toFalse
.save_charts
: whether to save the charts generated by PandaAI. Defaults toFalse
. You will find the charts in the root of your project or in the path specified bysave_charts_path
.save_charts_path
: the path where to save the charts. Defaults toexports/charts/
. You can use this setting to override the default path.open_charts
: whether to open the chart during parsing of the response from the LLM. Defaults toTrue
. You can completely disable displaying of charts by setting this option toFalse
.enable_cache
: whether to enable caching. Defaults toTrue
. If set toTrue
, PandaAI will cache the results of the LLM to improve the response time. If set toFalse
, PandaAI will always call the LLM.max_retries
: the maximum number of retries to use when using the error correction framework. Defaults to3
. You can use this setting to override the default number of retries.security
: The “security” parameter allows for three levels depending on specific use cases: “none,” “standard,” and “advanced.” “standard” and “advanced” are especially useful for detecting malicious intent from user queries and avoiding the execution of potentially harmful code. By default, the “security” is set to “standard.” The security check might introduce stricter rules that could flag benign queries as harmful. You can deactivate it in the configuration by setting “security” to “none.”
Demo in Google Colab
Try out PandaAI in your browser:
Other Examples
You can find all the other examples here.
Was this page helpful?