close

How to split folder of images into test/training/validation sets with stratified sampling?

Hello Guys, How are you all? Hope You all Are Fine. Today We Are Going To learn about How to split folder of images into test/training/validation sets with stratified sampling in Python. So Here I am Explain to you all the possible Methods here.

Without wasting your time, Let’s start This Article.

How to split folder of images into test/training/validation sets with stratified sampling?

  1. How to split folder of images into test/training/validation sets with stratified sampling?

    The number of images in each folder can be varied using the values in the ratio argument(train:val:test).

  2. split folder of images into test/training/validation sets with stratified sampling

    The number of images in each folder can be varied using the values in the ratio argument(train:val:test).

Method 1

Use the python library split-folder.

pip install split-folders

Let all the images be stored in Data folder. Then apply as follows:

import split_folders
split_folders.ratio('Data', output="output", seed=1337, ratio=(.8, 0.1,0.1)) 

On running the above code snippet, it will create 3 folders in the output directory:

  • train
  • val
  • test

The number of images in each folder can be varied using the values in the ratio argument(train:val:test).

Method 2

I ran into a similar problem myself. All my images were stored in two folders. “Project/Data2/DPN+” and “Project/Data2/DPN-“. It was a binary classification problem. The two classes were “DPN+” and “DPN-“. Both of these class folders had .png in them. My objective was to distribute the dataset into training, validation and testing folders. Each of these new folders will have 2 more folders – “DPN+” and “DPN-” – inside them indicating the class. For partition, I used 70:15:15 distribution. I am a beginner in python so, please let me know if I made any mistakes.

Following is my code:

import os
import numpy as np
import shutil

# # Creating Train / Val / Test folders (One time use)
root_dir = 'Data2'
posCls = '/DPN+'
negCls = '/DPN-'

os.makedirs(root_dir +'/train' + posCls)
os.makedirs(root_dir +'/train' + negCls)
os.makedirs(root_dir +'/val' + posCls)
os.makedirs(root_dir +'/val' + negCls)
os.makedirs(root_dir +'/test' + posCls)
os.makedirs(root_dir +'/test' + negCls)

# Creating partitions of the data after shuffeling
currentCls = posCls
src = "Data2"+currentCls # Folder to copy images from

allFileNames = os.listdir(src)
np.random.shuffle(allFileNames)
train_FileNames, val_FileNames, test_FileNames = np.split(np.array(allFileNames),
                                                          [int(len(allFileNames)*0.7), int(len(allFileNames)*0.85)])


train_FileNames = [src+'/'+ name for name in train_FileNames.tolist()]
val_FileNames = [src+'/' + name for name in val_FileNames.tolist()]
test_FileNames = [src+'/' + name for name in test_FileNames.tolist()]

print('Total images: ', len(allFileNames))
print('Training: ', len(train_FileNames))
print('Validation: ', len(val_FileNames))
print('Testing: ', len(test_FileNames))

# Copy-pasting images
for name in train_FileNames:
    shutil.copy(name, "Data2/train"+currentCls)

for name in val_FileNames:
    shutil.copy(name, "Data2/val"+currentCls)

for name in test_FileNames:
    shutil.copy(name, "Data2/test"+currentCls)

Summery

It’s all About this issue. Hope all Methods helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which Method worked for you? Thank You.

Also, Read