Login

Sign Up

Exploratory data analysis with Pandas: Sorting (Part~2)
Ujjwal Paliwal

Posted on Dec 17, 2024 | Backend

Exploratory data analysis with Pandas: Sorting (Part~2)

Sorting

a DataFrame can be sorted by the value of one of the variables (i.e., columns). For example, we can sort by Total day charge (use ascending=False to sort in descending order):

df.sort_values(by="Total day charge", ascending=False). head ()
State Account Length Area Code International Plan Voice Mail Plan Number Vmail Messages Total Day Minutes Total Day Calls Total Day Charge Total Eve Minutes Total Eve Calls Total Eve Charge Total Night Minutes Total Night Calls Total Night Charge Total Intl Minutes Total Intl Calls Total Intl Charge Customer Service Calls Churn
CO 154 415 No No 0 350.8 75 59.64 216.5 94 18.40 253.9 100 11.43 10.1 9 2.73 1 1
NY 64 415 Yes No 0 346.8 55 58.96 249.5 79 21.21 275.4 102 12.39 13.3 9 3.59 1 1
OH 115 510 Yes No 0 345.3 81 58.70 203.4 106 17.29 217.5 107 9.79 11.8 8 3.19 1 1
OH 83 415 No No 0 337.4 120 57.36 227.4 116 19.33 153.9 114 6.93 15.8 7 4.27 0 1
MO 112 415 No No 0 335.5 77 57.04 212.5 109 18.06 265.0 132 11.93 12.7 8 3.43 2 1

We can also sort by multiple columns:

df.sort_values(by= ["Churn", "Total day charge"], ascending= [True, False]). head ()

Indexing and retrieving data

A DataFrame can be indexed in a few different ways.

To get a single column, you can use a DataFrame['Name'] construction. Let’s use this to answer a question about that column alone: what is the proportion of churned users in our dataframe?

df["Churn"]. mean ()
np. float64(0.14491449144914492)

What are the average values of numerical features for churned users?

Boolean indexing with one column is also very convenient. The syntax is df[P(df['Name'])] , where P is some logical condition that is checked for each element of the Name column. The result of such indexing is the DataFrame consisting only of the rows that satisfy the P condition on the Name column.

python
 df.select_dtypes(include=np.number)[df["Churn"] == 1]. mean ()
python
Account length            102.66
Area code                 437.82
Number vmail messages       5.12
Total day minutes         206.91
Total day calls           101.34
Total day charge           35.18
Total eve minutes         212.41
Total eve calls           100.56
Total evecharge           18.05
Total night minutes       205.23
Total night calls         100.40
Total night charge          9.24
Total intl minutes         10.70
Total intl calls            4.16
Total intl charge           2.89
Customer service calls      2.23
Churn                       1.00
dtype: float64

How much time (on average) do churned users spend on the phone during daytime?

df[df["Churn"] == 1] ["Total day minutes"]. mean ()
np. float64(206.91407867494823) 

What is the maximum length of international calls among loyal users (Churn == 0) who do not have an international plan?

df[(df["Churn"] == 0) & (df["International plan"] == "No")] ["Total intl minutes"].max () 
np.float64(18.9)

DataFrames can be indexed by column name (label) or row name (index) or by the serial number of a row. The loc method is used for indexing by name, while iloc() is used for indexing by number.

df.loc[0:5, "State":"Area code"]
State Account length Area code
KS 128 415
OH 107 415
NJ 137 415
OH 84 408
OK 75 415
AL 118 510

If we need the first or the last line of the data frame, we can use the df[:1] or df[-1:] construction:

df[-1:]
State Account length Area code International plan Voice mail plan Number vmail messages Total day minutes Total day calls Total day charge Total eve minutes Total eve calls Total eve charge Total night minutes Total night calls Total night charge Total intl minutes Total intl calls Total intl charge Customer service calls Churn
TN 74 415 No Yes 25 234.4 113 39.85 265.9 82 22.60 241.4 77 10.86 13.7 4 3.70 0 0

6 Reactions

1 Bookmarks

Read next

Ujjwal Paliwal

Ujjwal Paliwal

Dec 14, 24

4 min read

|

Building an Own AI Chatbot: Integrating Custom Knowledge Bases

Ujjwal Paliwal

Ujjwal Paliwal

Dec 15, 24

9 min read

|

Exploratory data analysis with Pandas:Part 1