close

[Solved] No FileSystem for scheme: s3 with pyspark

Hello Guys, How are you all? Hope You all Are Fine. Today I get the following error No FileSystem for scheme: s3 with pyspark in python. So Here I am Explain to you all the possible solutions here.

Without wasting your time, Let’s start This Article to Solve This Error.

How No FileSystem for scheme: s3 with pyspark Error Occurs?

Today I get the following error No FileSystem for scheme: s3 with pyspark in python.

How To Solve No FileSystem for scheme: s3 with pyspark Error?

  1. How To Solve No FileSystem for scheme: s3 with pyspark Error?

    To Solve No FileSystem for scheme: s3 with pyspark Error If you add other packages, make sure the format is: 'groupId:artifactId:version' and the packages are separated by commas.

  2. No FileSystem for scheme: s3 with pyspark

    To Solve No FileSystem for scheme: s3 with pyspark Error If you add other packages, make sure the format is: 'groupId:artifactId:version' and the packages are separated by commas.

Solution 1

If you are using a local machine you can use boto3:

s3 = boto3.resource('s3')
# get a handle on the bucket that holds your file
bucket = s3.Bucket('yourBucket')
# get a handle on the object you want (i.e. your file)
obj = bucket.Object(key='yourFile.extension')
# get the object
response = obj.get()
# read the contents of the file and split it into a list of lines
lines = response[u'Body'].read().split('\n')

(do not forget to setup your AWS S3 credentials).

Another clean solution if you are using an AWS Virtual Machine (EC2) would be granting S3 permissions to your EC2 and launching pyspark with this command:

pyspark --packages com.amazonaws:aws-java-sdk-pom:1.10.34,org.apache.hadoop:hadoop-aws:2.7.2

If you add other packages, make sure the format is: ‘groupId:artifactId:version’ and the packages are separated by commas.

If you are using pyspark from Jupyter Notebooks this will work:

import os
import pyspark
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk-pom:1.10.34,org.apache.hadoop:hadoop-aws:2.7.2 pyspark-shell'
from pyspark.sql import SQLContext
from pyspark import SparkContext
sc = SparkContext()
sqlContext = SQLContext(sc)
filePath = "s3a://yourBucket/yourFile.parquet"
df = sqlContext.read.parquet(filePath) # Parquet file read example

Solution 2

If you’re using a jupyter notebook, you must two files to the class path for spark:

/home/ec2-user/anaconda3/envs/ENV-XXX/lib/python3.6/site-packages/pyspark/jars

the two files are :

  • hadoop-aws-2.10.1-amzn-0.jar
  • aws-java-sdk-1.11.890.jar

Summery

It’s all About this issue. Hope all solution helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which solution worked for you? Thank You.

Also, Read