close

How can I read tar.gz file using pandas read_csv with gzip compression option?

Hello Guys, How are you all? Hope You all Are Fine. Today We Are Going To learn about How can I read tar.gz file using pandas read_csv with gzip compression option in Python. So Here I am Explain to you all the possible Methods here.

Without wasting your time, Let’s start This Article.

Table of Contents

How can I read tar.gz file using pandas read_csv with gzip compression option?

  1. How can I read tar.gz file using pandas read_csv with gzip compression option?

    df = pd.read_csv('sample.tar.gz', compression='gzip', header=0, sep=' ', quotechar='"', error_bad_lines=False)
    Note: error_bad_lines=False will ignore the offending rows.

  2. read tar.gz file using pandas read_csv with gzip compression option

    df = pd.read_csv('sample.tar.gz', compression='gzip', header=0, sep=' ', quotechar='"', error_bad_lines=False)
    Note: error_bad_lines=False will ignore the offending rows.

Method 1

df = pd.read_csv('sample.tar.gz', compression='gzip', header=0, sep=' ', quotechar='"', error_bad_lines=False)

Note: error_bad_lines=False will ignore the offending rows.

Method 2

You can use the tarfile module to read a particular file from the tar.gz archive. If there is only one file in the archive, then you can do this:

import tarfile
import pandas as pd
with tarfile.open("sample.tar.gz", "r:*") as tar:
    csv_path = tar.getnames()[0]
    df = pd.read_csv(tar.extractfile(csv_path), header=0, sep=" ")

The read mode r:* handles the gz extension (or other kinds of compression) appropriately. If there are multiple files in the zipped tar file, then you could do something like csv_path = list(n for n in tar.getnames() if n.endswith('.csv'))[-1] line to get the last csv file in the archived folder.

Summery

It’s all About this issue. Hope all Methods helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which Method worked for you? Thank You.

Also, Read