close

how to use pandas filter with IQR

Hello Guys, How are you all? Hope You all Are Fine. Today We Are Going To learn about how to use pandas filter with IQR in Python. So Here I am Explain to you all the possible Methods here.

Without wasting your time, Let’s start This Article.

Table of Contents

how to use pandas filter with IQR?

  1. how to use pandas filter with IQR?

    Then we can plot the result to check the difference. We observe that the outlier in the left boxplot (the cross at 183) does not appear anymore in the filtered series.

  2. use pandas filter with IQR

    Then we can plot the result to check the difference. We observe that the outlier in the left boxplot (the cross at 183) does not appear anymore in the filtered series.

Method 1

As far as I know, the most compact notation seems to be brought by the query method.

# Some test data
np.random.seed(33454)
df = (
    # A standard distribution
    pd.DataFrame({'nb': np.random.randint(0, 100, 20)})
        # Adding some outliers
        .append(pd.DataFrame({'nb': np.random.randint(100, 200, 2)}))
        # Reseting the index
        .reset_index(drop=True)
    )

# Computing IQR
Q1 = df['nb'].quantile(0.25)
Q3 = df['nb'].quantile(0.75)
IQR = Q3 - Q1

# Filtering Values between Q1-1.5IQR and Q3+1.5IQR
filtered = df.query('(@Q1 - 1.5 * @IQR) <= nb <= (@Q3 + 1.5 * @IQR)')

Then we can plot the result to check the difference. We observe that the outlier in the left boxplot (the cross at 183) does not appear anymore in the filtered series.

# Ploting the result to check the difference
df.join(filtered, rsuffix='_filtered').boxplot()
Comparison before and after filterinf

Method 2

Another approach using Series.between():

iqr = df['col'][df['col'].between(df['col'].quantile(.25), df['col'].quantile(.75), inclusive=True)]

Drawn out:

# Select the first quantile
q1 = df['col'].quantile(.25)

# Select the third quantile
q3 = df['col'].quantile(.75)

# Create a mask inbeetween q1 & q3
mask = df['col'].between(q1, q3, inclusive=True)

# Filtering the initial dataframe with a mask
iqr = df.loc[mask, 'col']

Summery

It’s all About this issue. Hope all Methods helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which Method worked for you? Thank You.

Also, Read