close

How to map numeric data into categories / bins in Pandas dataframe

Hello Guys, How are you all? Hope You all Are Fine. Today We Are Going To learn about How to map numeric data into categories / bins in Pandas dataframe in Python. So Here I am Explain to you all the possible Methods here.

Without wasting your time, Let’s start This Article.

How to map numeric data into categories / bins in Pandas dataframe?

  1. How to map numeric data into categories / bins in Pandas dataframe?

    With Pandas, you should avoid row-wise operations, as these usually involve an inefficient Python-level loop. Here are a couple of alternatives.

  2. map numeric data into categories / bins in Pandas dataframe

    With Pandas, you should avoid row-wise operations, as these usually involve an inefficient Python-level loop. Here are a couple of alternatives.

Method 1

With Pandas, you should avoid row-wise operations, as these usually involve an inefficient Python-level loop. Here are a couple of alternatives.

Pandas: pd.cut

you can use pd.cut for this, the benefit here being that your new column becomes a Categorical.

You only need to define your boundaries (including np.inf) and category names, then apply pd.cut to the desired numeric column.

bins = [0, 2, 18, 35, 65, np.inf]
names = ['<2', '2-18', '18-35', '35-65', '65+']

df['AgeRange'] = pd.cut(df['Age'], bins, labels=names)

print(df.dtypes)

# Age             int64
# Age_units      object
# AgeRange     category
# dtype: object

NumPy: np.digitize

np.digitize provides another clean solution. The idea is to define your boundaries and names, create a dictionary, then apply np.digitize to your Age column. Finally, use your dictionary to map your category names.

Note that for boundary cases the lower bound is used for mapping to a bin.

import pandas as pd, numpy as np

df = pd.DataFrame({'Age': [99, 53, 71, 84, 84],
                   'Age_units': ['Y', 'Y', 'Y', 'Y', 'Y']})

bins = [0, 2, 18, 35, 65]
names = ['<2', '2-18', '18-35', '35-65', '65+']

d = dict(enumerate(names, 1))

df['AgeRange'] = np.vectorize(d.get)(np.digitize(df['Age'], bins))

Result

   Age Age_units AgeRange
0   99         Y      65+
1   53         Y    35-65
2   71         Y      65+
3   84         Y      65+
4   84         Y      65+

Summery

It’s all About this issue. Hope all Methods helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which Method worked for you? Thank You.

Also, Read