close

How to find probability distribution and parameters for real data? (Python 3)

Hello Guys, How are you all? Hope You all Are Fine. Today We Are Going To learn about How to find probability distribution and parameters for real data? (Python 3) in Python. So Here I am Explain to you all the possible Methods here.

Without wasting your time, Let’s start This Article.

Table of Contents

How to find probability distribution and parameters for real data? (Python 3)

  1. How to find probability distribution and parameters for real data? (Python 3)

    To the best of my knowledge, there is no automatic way of obtaining the distribution type and parameters of a sample (as inferring the distribution of a sample is a statistical problem by itself).

  2. find probability distribution and parameters for real data? (Python 3)

    To the best of my knowledge, there is no automatic way of obtaining the distribution type and parameters of a sample (as inferring the distribution of a sample is a statistical problem by itself).

Method 1

Use this approach

import scipy.stats as st
def get_best_distribution(data):
    dist_names = ["norm", "exponweib", "weibull_max", "weibull_min", "pareto", "genextreme"]
    dist_results = []
    params = {}
    for dist_name in dist_names:
        dist = getattr(st, dist_name)
        param = dist.fit(data)

        params[dist_name] = param
        # Applying the Kolmogorov-Smirnov test
        D, p = st.kstest(data, dist_name, args=param)
        print("p value for "+dist_name+" = "+str(p))
        dist_results.append((dist_name, p))

    # select the best fitted distribution
    best_dist, best_p = (max(dist_results, key=lambda item: item[1]))
    # store the name of the best fit and its p value

    print("Best fitting distribution: "+str(best_dist))
    print("Best p value: "+ str(best_p))
    print("Parameters for the best fit: "+ str(params[best_dist]))

    return best_dist, best_p, params[best_dist]

Method 2

To the best of my knowledge, there is no automatic way of obtaining the distribution type and parameters of a sample (as inferring the distribution of a sample is a statistical problem by itself).

In my opinion, the best you can do is:

(for each attribute)

  • Try to fit each attribute to a reasonably large list of possible distributions
  • Evaluate all your fits and pick the best one. This can be done by performing a Kolmogorov-Smirnov test between your sample and each of the distributions of the fit (you have an implementation in Scipy, again), and picking the one that minimises D, the test statistic (a.k.a. the difference between the sample and the fit).

Bonus: It would make sense – as you’ll be building a model on each of the variables as you pick a fit for each one – although the goodness of your prediction would depend on the quality of your data and the distributions you are using for fitting. You are building a model, after all.

Conclusion

It’s all About this issue. Hope all Methods helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which Method worked for you? Thank You.

Also, Read