close

How to replace the white space in a string in a pandas dataframe?

Hello Guys, How are you all? Hope You all Are Fine. Today We Are Going To learn about How to replace the white space in a string in a pandas dataframe in Python. So Here I am Explain to you all the possible Methods here.

Without wasting your time, Let’s start This Article.

Table of Contents

How to replace the white space in a string in a pandas dataframe?

  1. How to replace the white space in a string in a pandas dataframe?

    From some rough benchmarking, it predictably seems like piRSquared's NumPy solution is indeed the fastest, for this small sample at least, followed by DataFrame.replace.

  2. replace the white space in a string in a pandas dataframe

    From some rough benchmarking, it predictably seems like piRSquared's NumPy solution is indeed the fastest, for this small sample at least, followed by DataFrame.replace.

Method 1

I think you could also just opt for DataFrame.replace.

df.replace(' ', '_', regex=True)

Outputs

      Person_1    Person_2     Person_3
0   John_Smith  Jane_Smith   Mark_Smith
1  Harry_Jones  Mary_Jones  Susan_Jones

From some rough benchmarking, it predictably seems like piRSquared’s NumPy solution is indeed the fastest, for this small sample at least, followed by DataFrame.replace.

%timeit df.values[:] = np.core.defchararray.replace(df.values.astype(str), ' ', '_')
10000 loops, best of 3: 78.4 µs per loop

%timeit df.replace(' ', '_', regex=True)
1000 loops, best of 3: 932 µs per loop

%timeit df.stack().str.replace(' ', '_').unstack()
100 loops, best of 3: 2.29 ms per loop

Interestingly however, it appears that piRSquared’s Pandas solution scales much better with larger DataFrames than DataFrame.replace, and even outperforms the NumPy solution.

>>> df = pd.DataFrame([['John Smith', 'Jane Smith', 'Mark Smith']*10000,
                       ['Harry Jones', 'Mary Jones', 'Susan Jones']*10000])
%timeit df.values[:] = np.core.defchararray.replace(df.values.astype(str), ' ', '_')
10 loops, best of 3: 181 ms per loop

%timeit df.replace(' ', '_', regex=True)
1 loop, best of 3: 4.14 s per loop

%timeit df.stack().str.replace(' ', '_').unstack()
10 loops, best of 3: 99.2 ms per loop

Method 2

Use replace method of dataframe:

df.replace('\s+', '_',regex=True,inplace=True)

Summery

It’s all About this issue. Hope all Methods helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which Method worked for you? Thank You.

Also, Read