Documentation
What is Apache Parquet?
Apache Parquet is a columnar storage format optimized for analytics workloads. It provides efficient compression and encoding, enabling faster scans and lower storage costs compared to row-based formats. Parquet is widely used in data engineering and analytics stacks including Spark, Trino, Presto, Snowflake, BigQuery, and more.
Why use Parquet Editor?
Parquet Editor lets you open, inspect, and modify Parquet files directly in your browser. Powered by Embedded DB in your browser and WebAssembly, your files never leave your device, making it ideal for sensitive data.
Getting Started (Step-by-step)
- Open the Home page.
- Click Select Parquet File and choose a .parquet file.
- Review the Schema tab, make changes if needed.
- Switch to the Data tab to add, edit, or delete rows.
- Click Download Parquet File to save your changes.
Tips & Best Practices
- Keep schemas stable across files to simplify downstream processing.
- Use appropriate types (e.g., timestamps vs strings) for better performance.
- Validate data ranges and nullability before exporting.
FAQ
- Are my files uploaded? No. All processing happens in your browser.
- Max file size? Depends on your browser memory. Try splitting very large datasets.
- Is it free? Yes, the tool is free to use.