Data Transformations
Available data transformations in PandaAI
Release v3 is currently in beta. This documentation reflects the features and functionality in progress and may change before the final release.
Data Transformations in PandaAI
PandaAI provides a rich set of data transformations that can be applied to your data. These transformations can be specified in your schema file or applied programmatically.
String Transformations
Numeric Transformations
Date and Time Transformations
Data Cleaning Transformations
Categorical Transformations
Rename Column
Renames a column to a new name.
Parameters:
column
(str): The current column namenew_name
(str): The new name for the column
Example:
This will rename the column old_name
to new_name
.
Validation Transformations
Privacy and Security Transformations
Type Conversion Transformations
Chaining Transformations
You can chain multiple transformations in sequence. The transformations will be applied in the order they are specified:
Programmatic Usage
While schema files are convenient for static transformations, you can also apply transformations programmatically using the TransformationManager
:
This approach allows for a fluent interface, chaining multiple transformations together. Each method returns the manager instance, enabling further transformations. The final .df
attribute returns the transformed DataFrame.
Complete Example
Let’s walk through a complete example of data transformation using a sales dataset. This example demonstrates how to clean, validate, and prepare your data for analysis.
Sample Data
Consider a CSV file sales_data.csv
with the following structure:
Schema File
Create a schema.yaml
file to define the transformations:
Python Code
Here’s how to use the schema and transformations in your code:
Result
The transformed data will look like this:
Notice how the transformations have:
- Standardized product names
- Padded store IDs
- Removed negative quantity rows
- Added 10% tax to prices
- Validated email addresses
- Added an email validation column
This example demonstrates how to use multiple transformations together to clean and prepare your data for analysis. The transformations are applied in sequence, and each transformation builds on the results of the previous ones.
Was this page helpful?