Documents
What is Apache Parquet?
Apache Parquet is a columnar storage format optimized for analytics workloads. It provides efficient compression and encoding, enabling faster scans and lower storage costs compared to row-based formats. Parquet is widely used in data engineering and analytics stacks including Spark, Trino, Presto, Snowflake, BigQuery, and more.
Why use Parquet Editor?
Parquet Editor lets you open, inspect, and modify Parquet files directly in your browser. Powered by Embedded DB in your browser and WebAssembly, your files never leave your device, making it ideal for sensitive data.
Getting Started (Step-by-step)
Option 1: Edit an Existing Parquet File
- Open the Home page.
- Click Select Parquet File and choose a .parquet file from your computer.
- Review the Schema tab to see column names, types, and nullable properties.
- Switch to the Data tab to view, add, edit, or delete rows.
- Click Download Parquet File to save your changes.
Option 2: Create a New Parquet File from Scratch
- Open the Home page.
- Click New File to start with a blank schema.
- In the Schema tab, define your columns:
- Click Add Column to create new columns
- Set the column name (e.g., "user_id", "email", "created_at")
- Choose the data type (string, integer, double, boolean, timestamp, date, etc.)
- Toggle nullable if the column can contain null values
- For complex types, you can define structs, lists, and maps
- Switch to the Data tab and click Add Row to insert data.
- Fill in the values for each column in your new rows.
- Click Download Parquet File to export your newly created file.
Tips & Best Practices
- Keep schemas stable across files to simplify downstream processing.
- Use appropriate types (e.g., timestamps vs strings) for better performance.
- Validate data ranges and nullability before exporting.
FAQ
- Are my files uploaded? No. All processing happens in your browser.
- Max file size? Depends on your browser memory. Try splitting very large datasets.
- Is it free? Yes, the tool is free to use.
Release Notes
December 2025
- Parquet Version 2 Support: Added full support for reading and writing Parquet files using version 2 format specifications, enabling better compression and encoding options.
- Large File Handling: Added warnings and improved error messages for large files to help users understand browser memory limitations.
June 2025
- Pagination Support: Added pagination for viewing large datasets with thousands of rows efficiently.
- Advanced Filtering & Sorting: Introduced column-level filtering and multi-column sorting capabilities for better data exploration.
- Struct Type Support: Full support for creating and editing Parquet struct (nested object) types in your schemas.
- Malformed Data Handling: Improved error handling for corrupted or malformed struct data.
May 2025
- Decimal Type Support: Added support for decimal/numeric data types with precision and scale.
- Complex Data Types: Enhanced support for nested schemas including maps, lists, and arrays.
- BigInt Support: Proper handling of large integer values and BigInt data types.
- Timestamp & Date Improvements: Fixed issues with timestamp precision and date rendering in arrays.
- Export Enhancements: Improved file export functionality with better handling of edited data.
January 2025
- Initial Release: Browser-based Parquet file editor with schema editing and data manipulation.
- Privacy-First Design: All processing happens locally in the browser with no server uploads.
- Complex Datatype Support: Initial support for nested and complex Parquet data types.
