How to convert CSV columns to text in Python?


CSV (Comma Separated Values) files are commonly used to store and exchange tabular data. However, there may be situations where you need to convert the data in CSV columns to text format, for example, to use it as input for natural language processing tasks.

Python provides a variety of tools and libraries that can help with this task. In this tutorial, we will explore different methods for converting CSV columns to text in Python, including using the built-in CSV module, Pandas library, and regular expressions. We will also discuss how to handle different types of data and possible issues that may arise during the conversion process.

Approach

  • Load the CSV file into a pandas DataFrame using the read_csv() function.

  • Extract the desired column from the DataFrame using the indexing operator, and convert it to a list using the list() constructor.

  • Iterate over the elements of the list and convert each element to a string using the str() function.

  • Join the resulting strings using a specific character or space using the join() method to create a single text string.

This approach reads in the CSV file with pandas, converts the desired column to a list, converts each element in the list to a string, and then joins the resulting strings into a single text string. This can be useful for further processing or analysis of the text data.

Let's say that we have a CSV file named inputs.csv which contains the following data inside it.

inputs.csv

Name, Age, Occupation
John, 32, Engineer
Jane, 28, Teacher
Bob, 45, Salesperson

Now let's consider the code where we will be selecting the second column of the CSV file and convert the data into the text format.

Converting Specific Column of CSV into Text

Consider the code shown below.

Example

import pandas as pd

# Read the CSV file into a pandas DataFrame
df = pd.read_csv('input.csv')

# Select the second column and convert it to a text Series
text_series = df.iloc[:, 1].astype(str)

# Join the text Series into a single string
text_string = ' '.join(text_series)

# Print the resulting text string
print(text_string)

Explanation

  • Import the Pandas library with the alias "pd". Use the read_csv() function from pandas to read in the CSV file named "input.csv" and store it as a pandas DataFrame in the variable "df".

  • Use iloc[:, 1] to select the second column of the DataFrame (iloc stands for "integer location", and the [:, 1] notation selects all rows (:) and the second column (1)).

  • Convert the selected column to a text Series by calling the astype() method with the argument str.

  • Join the elements of the text series into a single string using the join() method with a space character as the separator, and store the resulting string in the variable text_string. Print the resulting text string using the print() function.

To run the above code, we first need to install the Pandas library in our machine, and for that we can make use of the command shown below

pip3 install pandas

Once Pandas is installed successfully, we can run the command shown below.

python3 main.py

Output

The output of the above command is shown below.

32 28 45

Converting All Columns of CSV into Text

In the previous example, we explored how we can convert a specific column of the CSV into text, in this example we will explore how we can convert all the columns data into text format.

This code uses the Pandas library to read a CSV file into a pandas DataFrame, and then converts each column of the DataFrame into a text string. The resulting text strings are printed to the console.

To do this, the code creates a list of text Series, one for each column in the DataFrame. It then uses a list comprehension to iterate over the list of text Series, and joins each one into a single string using the join() method. Finally, the resulting list of text strings is printed to the console using a for loop.

Example

Consider the code shown below for the same "inputs.csv" file that we had above.

import pandas as pd

# Read the CSV file into a pandas DataFrame
df = pd.read_csv('input.csv')

# Convert all columns to text Series
text_series_list = [df[col].astype(str) for col in df.columns]

# Join each text Series into a single string
text_strings = [' '.join(text_series) for text_series in text_series_list]

# Print the resulting text strings
for text_string in text_strings:
   print(text_string)

Output

The output of the above command is shown below.

John,Jane,Bob
32,28,45
Engineer,Teacher,Salesperson

Conclusion

In conclusion, converting CSV columns to text in Python is a straightforward task using the Pandas library. By reading the CSV file into a pandas DataFrame, we can easily select and convert specific columns to text Series. Then, by using the join() method, we can concatenate the text Series into a single string for each column. This method allows us to extract the text data from a CSV file and manipulate it as needed for text analysis, natural language processing, or other applications that require text data

Updated on: 18-Apr-2023

7K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements