# How to calculate correlation between all columns and remove highly correlated ones using pandas?

Hello Guys, How are you all? Hope You all Are Fine. Today We Are Going To learn about How to calculate correlation between all columns and remove highly correlated ones using pandas in Python. So Here I am Explain to you all the possible Methods here.

## How to calculate correlation between all columns and remove highly correlated ones using pandas?

1. How to calculate correlation between all columns and remove highly correlated ones using pandas?

You can use the following for a given data frame df:
`corr_matrix = df.corr().abs() high_corr_var=np.where(corr_matrix>0.8)`

2. calculate correlation between all columns and remove highly correlated ones using pandas

You can use the following for a given data frame df:
`corr_matrix = df.corr().abs() high_corr_var=np.where(corr_matrix>0.8)`

## Method 1

Here is the approach which I have used –

```def correlation(dataset, threshold):
col_corr = set() # Set of all the names of deleted columns
corr_matrix = dataset.corr()
for i in range(len(corr_matrix.columns)):
for j in range(i):
if (corr_matrix.iloc[i, j] >= threshold) and (corr_matrix.columns[j] not in col_corr):
colname = corr_matrix.columns[i] # getting the name of column
if colname in dataset.columns:
del dataset[colname] # deleting the column from the dataset

print(dataset)```

## Method 2

You can use the following for a given data frame df:

```corr_matrix = df.corr().abs()
high_corr_var=np.where(corr_matrix>0.8)
high_corr_var=[(corr_matrix.columns[x],corr_matrix.columns[y]) for x,y in zip(*high_corr_var) if x!=y and x<y]```

## Conclusion

It’s all About this issue. Hope all Methods helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which Method worked for you? Thank You.