Hello Guys, How are you all? Hope You all Are Fine. Today We Are Going To learn about **How to plot a value_counts in pandas that has a huge number of different counts not distributed evenly** **in Python**. So Here I am Explain to you all the possible Methods here.

Without wasting your time, Let’s start This Article.

Table of Contents

## How to plot a value_counts in pandas that has a huge number of different counts not distributed evenly?

**How to plot a value_counts in pandas that has a huge number of different counts not distributed evenly?**You could keep the normalized value counts above a certain

`threshold`

. Then sum together the values below the`threshold`

and clump them together in one category which could be called, say, “other”.**plot a value_counts in pandas that has a huge number of different counts not distributed evenly**You could keep the normalized value counts above a certain

`threshold`

. Then sum together the values below the`threshold`

and clump them together in one category which could be called, say, “other”.

## Method 1

You could keep the normalized value counts above a certain `threshold`

. Then sum together the values below the `threshold`

and clump them together in one category which could be called, say, “other”.

By choosing `threshold`

high enough, you will able to display the most important contributors to the overall probability distribution, while still showing the size of the tail in the bar labeled “other”:

import matplotlib.pyplot as plt import pandas as pd s2 = pd.Series([1,2,3,4,5,2,3,333,2,123,434,1,2,3,1,11,11,432,3,2,4,3,3,3,54,34,24,2,223,2535334,3,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30000, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2]) prob = s2.value_counts(normalize=True) threshold = 0.02 mask = prob > threshold tail_prob = prob.loc[~mask].sum() prob = prob.loc[mask] prob['other'] = tail_prob prob.plot(kind='bar') plt.xticks(rotation=25) plt.show()

There is a limit to the number of category labels you can sensibly display on a bar graph. For a normal-sized graph 3000 is way too many. Moreover, it is probably not reasonable to expect an audience to glean any meaning out of reading 3000 labels.

The graph should summarize the data. And the main point seems to be that 4 or 5% of the categories constitute the vast majority of the cases. So to drive home that point, perhaps use `pd.qcut`

to categorize the cases into simple categories such as `bottom 25%`

, `mid 70%`

, and `top 5%`

:

import numpy as np import matplotlib.pyplot as plt import pandas as pd N = 18000 categories = np.arange(N) np.random.shuffle(categories) M = int(N*0.04) prob = pd.Series(np.concatenate([np.random.randint(9000, 11000, size=M), np.random.randint(0, 100, size=N-M), ]), index=categories) prob /= prob.sum() category_classes = pd.qcut(prob, q=[0, .25, 0.95, 1.], labels=['bottom 25%', 'mid 70%', 'top 5%']) prob_groups = prob.groupby(category_classes).sum() prob_groups.plot(kind='bar') plt.xticks(rotation=0) plt.show()

## Method 2

Just log the axis (I have no pandas, but it should be similar):

import numpy as np import matplotlib.pyplot as plt s2 = np.log([1,2,3,4,5,2,3,333,2,123,434,1,2,3,1,11,11,432,3,2,4,3,3,3,54,34,24,2,223,2535334,3,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30000, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2]) plt.plot(s2) plt.show()

**Conclusion**

It’s all About this issue. Hope all Methods helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which Method worked for you? Thank You.

**Also, Read**