# How to calculate the counts of each distinct value in a pyspark dataframe?

I think you're looking to use the DataFrame idiom of groupBy and count.
For example, given the following dataframe, one state per row:

## Method 1

I think you’re looking to use the DataFrame idiom of groupBy and count.

For example, given the following dataframe, one state per row:

```df = sqlContext.createDataFrame([('TX',), ('NJ',), ('TX',), ('CA',), ('NJ',)], ('state',))
df.show()
+-----+
|state|
+-----+
|   TX|
|   NJ|
|   TX|
|   CA|
|   NJ|
+-----+
```

The following yields:

```df.groupBy('state').count().show()
+-----+-----+
|state|count|
+-----+-----+
|   TX|    2|
|   NJ|    2|
|   CA|    1|
+-----+-----+```

## Method 2

```import pandas as pd
import pyspark.sql.functions as F

def value_counts(spark_df, colm, order=1, n=10):
"""
Count top n values in the given column and show in the given order

Parameters
----------
spark_df : pyspark.sql.dataframe.DataFrame
Data
colm : string
Name of the column to count values in
order : int, default=1
1: sort the column descending by value counts and keep nulls at top
2: sort the column ascending by values
3: sort the column descending by values
4: do 2 and 3 (combine top n and bottom n after sorting the column by values ascending)
n : int, default=10
Number of top values to display

Returns
----------
Value counts in pandas dataframe
"""

if order==1 :