Posted on Dec 17, 2024 | Backend

Exploratory data analysis with Pandas: Sorting (Part~2)

Sorting

a DataFrame can be sorted by the value of one of the variables (i.e., columns). For example, we can sort by Total day charge (use ascending=False to sort in descending order):

df.sort_values(by="Total day charge", ascending=False). head ()

State	Account Length	Area Code	International Plan	Voice Mail Plan	Total Day Minutes	Total Day Calls	Total Day Charge	Total Eve Minutes	Total Eve Calls	Total Eve Charge	Total Night Minutes	Total Night Calls	Total Night Charge	Total Intl Minutes	Total Intl Calls	Total Intl Charge	Customer Service Calls	Churn
CO	154	415	No	No	350.8	75	59.64	216.5	94	18.40	253.9	100	11.43	10.1	9	2.73	1	1
NY	64	415	Yes	No	346.8	55	58.96	249.5	79	21.21	275.4	102	12.39	13.3	9	3.59	1	1
OH	115	510	Yes	No	345.3	81	58.70	203.4	106	17.29	217.5	107	9.79	11.8	8	3.19	1	1
OH	83	415	No	No	337.4	120	57.36	227.4	116	19.33	153.9	114	6.93	15.8	7	4.27	0	1
MO	112	415	No	No	335.5	77	57.04	212.5	109	18.06	265.0	132	11.93	12.7	8	3.43	2	1

We can also sort by multiple columns:

df.sort_values(by= ["Churn", "Total day charge"], ascending= [True, False]). head ()

Indexing and retrieving data

A DataFrame can be indexed in a few different ways.

To get a single column, you can use a DataFrame['Name'] construction. Let’s use this to answer a question about that column alone: what is the proportion of churned users in our dataframe?

df["Churn"]. mean ()

np. float64(0.14491449144914492)

What are the average values of numerical features for churned users?

Boolean indexing with one column is also very convenient. The syntax is df[P(df['Name'])] , where P is some logical condition that is checked for each element of the Name column. The result of such indexing is the DataFrame consisting only of the rows that satisfy the P condition on the Name column.

python
 df.select_dtypes(include=np.number)[df["Churn"] == 1]. mean ()

python
Account length            102.66
Area code                 437.82
Number vmail messages       5.12
Total day minutes         206.91
Total day calls           101.34
Total day charge           35.18
Total eve minutes         212.41
Total eve calls           100.56
Total evecharge           18.05
Total night minutes       205.23
Total night calls         100.40
Total night charge          9.24
Total intl minutes         10.70
Total intl calls            4.16
Total intl charge           2.89
Customer service calls      2.23
Churn                       1.00
dtype: float64

How much time (on average) do churned users spend on the phone during daytime?

df[df["Churn"] == 1] ["Total day minutes"]. mean ()

np. float64(206.91407867494823)

What is the maximum length of international calls among loyal users (Churn == 0) who do not have an international plan?

df[(df["Churn"] == 0) & (df["International plan"] == "No")] ["Total intl minutes"].max ()

np.float64(18.9)

DataFrames can be indexed by column name (label) or row name (index) or by the serial number of a row. The loc method is used for indexing by name, while iloc() is used for indexing by number.

df.loc[0:5, "State":"Area code"]

State	Account length	Area code
KS	128	415
OH	107	415
NJ	137	415
OH	84	408
OK	75	415
AL	118	510

If we need the first or the last line of the data frame, we can use the df[:1] or df[-1:] construction:

df[-1:]

State	Account length	Area code	International plan	Voice mail plan	Number vmail messages	Total day minutes	Total day calls	Total day charge	Total eve minutes	Total eve calls	Total eve charge	Total night minutes	Total night calls	Total night charge	Total intl minutes	Total intl calls	Total intl charge	Customer service calls	Churn
TN	74	415	No	Yes	25	234.4	113	39.85	265.9	82	22.60	241.4	77	10.86	13.7	4	3.70	0	0

6 Reactions

1 Bookmarks

Exploratory data analysis with Pandas: Sorting (Part~2)

Sorting

Indexing and retrieving data

What are the average values of numerical features for churned users?

How much time (on average) do churned users spend on the phone during daytime?

What is the maximum length of international calls among loyal users (Churn == 0) who do not have an international plan?

Read next

Building an Own AI Chatbot: Integrating Custom Knowledge Bases

Exploratory data analysis with Pandas:Part 1