- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Divide a DataFrame in a ratio
Pandas library is used to manipulate the data and analyze the data. The data will be created using the pandas library in two ways Dataframe and Series. A DataFrame is the two dimensional data structure containing the rows and columns.
There different ways to divide the DataFrame data based on the ratio. Let’s see them one by one.
Using np.random.rand()
Using pandas.DataFrame.sample()
Using numpy.split()
Using numpy.random.rand()
In the following example, we will divide the dataframe data into parts by defining the ratio using the randm.rand() function. If we want to divide the data in the percentage of 60% and 40% then we will define the ratio as 0.6 and 0.4.
import numpy as np ratio = np.random.rand(dataframe) dataframe[ratio comparision_operator value]
Example
In the following example, we will divide the dataframe data into parts by defining the ratio using the randm.rand() function. If we want to divide the data in the percentage of 60% and 40% then we will define the ratio as 0.6 and 0.4.
import numpy as np import pandas as pd data=pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv") data = data[:20] ratio = np.random.rand(len(data)) train_data = data[ratio < 0.6] train_data.head() test_data = data[ratio >= 0.6] test_data.head()
Output
PassengerId Survived Pclass ... Fare Cabin Embarked 4 5 0 3 ... 8.0500 NaN S 8 9 1 3 ... 11.1333 NaN S 10 11 1 3 ... 16.7000 G6 S 14 15 0 3 ... 7.8542 NaN S 17 18 1 2 ... 13.0000 NaN S [5 rows x 12 columns]
Using pandas.DataFrame.sample()
The other way to divide the Dataframe in the ratio is by using the sample() function with the DataFrame. It takes the two parameters frac used to define the fraction and random_state which takes the seed value for the random number generator.
Syntax
The below is the syntax.
dataframe.sample(frac,random_state)
Example
In the following example, we are dividing the data into two parts with the percentage of 50 and 50, using the sample() function available in the pandas library.
import pandas as pd dic = {"Letters":['A','B','C','D','E','F','G','H'], "Number":[1,2,3,4,5,6,7,8]} data = pd.DataFrame(dic) print("The Original data:") print(data) print("The 50% of the original data") train_data = data.sample(frac = 0.5, random_state = 40) print(train_data) print("Another 50% of the data") test_data = data.drop(train_data.index) print(test_data)
Output
The Original data: Letters Number 0 A 1 1 B 2 2 C 3 3 D 4 4 E 5 5 F 6 6 G 7 7 H 8 The 50% of the original data Letters Number 7 H 8 1 B 2 2 C 3 4 E 5 Another 50% of the data Letters Number 0 A 1 3 D 4 5 F 6 6 G 7
Using numpy.split() function
The other way to divide the dataframe based on ratio is numpy.split(). In Numpy library we have the split() function, which takes the dataframe along with the ratio*length of dataframe.
Syntax
The following is the syntax.
numpy.split(dataframe,[int(ratio*len(dataframe))
Example
Following example divides the dataframe in a ratio of 70% and 30% using the numpy.split() function.
import pandas as pd import numpy as np dic = {"Letters":['A','B','C','D','E','F','G','H'], "Number":[1,2,3,4,5,6,7,8]} data = pd.DataFrame(dic) print("The Original data:") print(data) print("The 70% of the original data") train_data, test_data= np.split(data,[int(0.7*len(data))]) print(train_data) print("Another 30% of the data") print(test_data)
Output
The Original data: Letters Number 0 A 1 1 B 2 2 C 3 3 D 4 4 E 5 5 F 6 6 G 7 7 H 8 The 70% of the original data Letters Number 0 A 1 1 B 2 2 C 3 3 D 4 4 E 5 Another 30% of the data Letters Number 5 F 6 6 G 7 7 H 8