• Home
  • The Song
  • The Avatar
  • The Cat
  • Contact the Cat

Gyp the Cat dot Com

How to Convert CSV to Parquet Easily with Python on Linux Shell
Uncategorized

How to Convert CSV to Parquet Easily with Python on Linux Shell

How to Convert CSV to Parquet Easily with Python on Linux Shell

Python has some great capabilities to make data more manageable for however you’re using it. Parquet is a format which can help to shrink structured data file sizes.

I have created an easily useable Python script geared towards command line applications.

Install some dependencies:

pip install pandas fastparquet pyarrow

Create the following file, and give it a sensible name such as convertcsvtoparquet.py

import sys
import pandas
import datetime

txt = str(sys.argv[1])

print(f'{datetime.datetime.now()} - Info - CSV to Parquet conversion - Starting File Name {txt}')

if txt.split('.')[-1] != 'csv':
  print('Error - Exiting - Not a CSV file')
  sys.exit(0)

print(f'{datetime.datetime.now()} - Info - Importing CSV')

try:
  inputfile = pandas.read_csv(sys.argv[1])
except:
  print(f'{datetime.datetime.now()} - Error - Exiting - CSV import failed')
  sys.exit(0)

print(f'{datetime.datetime.now()} - Info - Writing Parquet')

outputfile = txt.split('.')[0] + '.parquet'

inputfile.to_parquet(outputfile, compression='brotli')

print(f'{datetime.datetime.now()} - Complete - {outputfile} Written')

In order to operate it you can download some dummy data such as from https://www.heycsv.com/csv-sample-data and give it a test.

wget https://www.dropbox.com/s/muvfojx14t8nwxl/1M-sample-users.csv?dl=0 --output-document=dummy.csv

python3 convertcsvtoparquet.py dummy.csv

zip -9 -j dummy.zip dummy.csv

ls -lha

Related

Written by gyp - June 29, 2024 - 1105 Views

No Comment

Please Post Your Comments & Reviews
Cancel reply

Your email address will not be published. Required fields are marked *

Previous Post

Latest Posts

  • How to Convert CSV to Parquet Easily with Python on Linux Shell
  • Kusto Geolocation IP Lookup
  • Monitoring Tor Usage in Azure Sentinel, ASC, MDATP and ALA
  • HTTP to HTTPS Redirect on Azure CDN
  • Strongswan IPSec (Including Cryptomap) to Microsoft Azure Virtual Network Gateway
  • Black Ops 3 NAT Type Strict & PS4 NAT Type 3 with pfSense Fixed!
  • Sorry for the lack of posts
  • How to Block Internet Access with Group Policy (GPO)
  • Enforcing Microsoft Office 365 and Azure Tennancy with McAfee Web Gateway (MWG)
  • Scanning Subnet for Issuing Certificate Authority with OpenSSL

Top Posts & Pages

  • How to Block Internet Access with Group Policy (GPO)
    How to Block Internet Access with Group Policy (GPO)
  • How to Configure Windows 2012 NPS for Radius Authentication with Ubiquiti Unifi
    How to Configure Windows 2012 NPS for Radius Authentication with Ubiquiti Unifi
  • Kusto Geolocation IP Lookup
    Kusto Geolocation IP Lookup
  • Tinyproxy A Quick and Easy Proxy Server on Ubuntu
    Tinyproxy A Quick and Easy Proxy Server on Ubuntu
  • Monitoring Tor Usage in Azure Sentinel, ASC, MDATP and ALA
    Monitoring Tor Usage in Azure Sentinel, ASC, MDATP and ALA
  • How to DNSPerf on Ubuntu 14.04 with Installation and Quick Start
    How to DNSPerf on Ubuntu 14.04 with Installation and Quick Start
  • How to Add Different Disclaimers using alterMIME and Postfix based on Domain
    How to Add Different Disclaimers using alterMIME and Postfix based on Domain
  • Blocking Countries on Nginx without the GeoIP Module
    Blocking Countries on Nginx without the GeoIP Module
  • How to Enable Squid Anonymous Stealth Mode
    How to Enable Squid Anonymous Stealth Mode
  • Configuring Suite B, VPN-A and VPN-B in IPSec with Strongswan
    Configuring Suite B, VPN-A and VPN-B in IPSec with Strongswan

Tags

apache2 azure azure log analytics blops business centos cheating cissp cloudflare cryptography dns game google gyp internet iphone ipsec isc linux mac marketing microsoft mw2 mx mysql nginx pfsense postfix proxy ps3 qualification radius revision security seo smtp socks squid ssh strongswan tinyproxy ubuntu windows 2012 wordpress xdecrypt.com
Gyp the Cat dot Com

Some rights retained Gyp the Cat Dot Com