A tall pd.Series

Duplicate indices

A Python dict cannot have more than one instance of the same key:

dictionary = {
    "AL": "Alabama",
    "AK": "Alaska",
    "AR": "Arizona",  #illegal
    "AR": "Arkansas", #illegal
    "CA": "California"
}

But a pandas Series can have more than one instance of the same index. In the following Series, for example, 10,859 of the 27,200 rows have the index "Manhattan".

Series.head and Series.tail

"Create a Series giving the borough and cuisine of each restaurant in New York City."

import sys
import pandas as pd

#Website is
#https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j

url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv"
df = pd.read_csv(url).groupby("CAMIS").first()   #Keep only one row (the first) for each restaurant.
index = pd.Index(data = df["BORO"], name = "Borough")
series = pd.Series(data = df["CUISINE DESCRIPTION"].array, index = index, name = "Cuisine")

print(f"{len(series)              = :,}")
print(f'{len(series["Manhattan"]) = :,}')
print()

#The series has too many rows to print all of them.
print(series.head())    #Just the first 5 rows.  Means print(series.iloc[:5])
print()
print(series.tail(3))   #Just the last 3 rows.   Means print(series.iloc[-3:])

sys.exit(0)

len(series)              = 27,200
len(series["Manhattan"]) = 10,859

Borough
Bronx            Bakery
Brooklyn     Hamburgers
Manhattan         Irish
Brooklyn       American
Queens         American
Name: Cuisine, dtype: object

Borough
Queens      Other
Brooklyn    Other
Brooklyn    Other
Name: Cuisine, dtype: object

To see only the first three and the last three rows,

pd.set_option("max_rows", 6)
print(series)

Clean the data.
Index.isin

"Create a Series containing the rows that have an invalid borough."

import sys
import pandas as pd

#Website is
#https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j

url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv"
df = pd.read_csv(url).groupby("CAMIS").first()   #Keep only one row (the first) for each restaurant.
index = pd.Index(data = df["BORO"], name = "Borough")
series = pd.Series(data = df["CUISINE DESCRIPTION"].array, index = index, name = "Cuisine")

valid = [
    "Bronx",
    "Brooklyn",
    "Manhattan",
    "Queens",
    "Staten Island"
]

seriesOfBools = ~series.index.isin(valid)   #tilde means "not"
seriesOfBadRows = series[seriesOfBools]
seriesOfBadRows.name = "bad rows"

#Print the length of the seriesOfBadRows too.
s = seriesOfBadRows.to_string(dtype = True, length = True, name = True)
print(s)

sys.exit(0)

Borough
0        American
0        American
0        American
0    Scandinavian
0        Hawaiian
0           Other
0           Other
0           Other
0           Other
0           Other
0           Other
0           Other
0           Other
0           Other
0           Other
Name: bad rows, Length: 15, dtype: object

Unfortunately, 15 of the 27,200 rows have a bad borough. We can delete these rows by inserting the following statement immediately after creating valid. This is called data cleaning.

series = series[valid]

How many Chinese restaurants does Staten Island have?

The value of the expression series["Staten Island"] is a Series containing 983 rows. (Warning: had there been only one restaurant in Staten Island, the value of the expression series["Staten Island"] would have been a str, not a Series containing one row.)

"Create a Series giving the borough and cuisine of each restaurant in New York City."

import sys
import pandas as pd

#Website is
#https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j

url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv"
df = pd.read_csv(url).groupby("CAMIS").first()   #Keep only one row (the first) for each restaurant.
index = pd.Index(data = df["BORO"], name = "Borough")
series = pd.Series(data = df["CUISINE DESCRIPTION"].array, index = index, name = "Cuisine")

valid = [
    "Bronx",
    "Brooklyn",
    "Manhattan",
    "Queens",
    "Staten Island"
]
series = series[valid]

#Create a Series of all the restaurants in Staten Island.
statens = series["Staten Island"]

#Create a Series of all the Chinese Restaurants in New York City.
seriesOfBools = series == "Chinese"
chineses = series[seriesOfBools]

seriesOfBools = statens == "Chinese"
statenChineses = statens[seriesOfBools]

print(f"{len(series):6,} restaurants in New York City with known locations.")
print(f"{len(chineses):6,} Chinese restaurants in New York City.")
print(f"{len(statens):6,} restaurants in Staten Island.")
print(f"{len(statenChineses):6,} Chinese restaurants in Staten Island.")

sys.exit(0)

27,185 restaurants in New York City with known locations.
 2,396 Chinese restaurants in New York City.
   983 restaurants in Staten Island.
    78 Chinese restaurants in Staten Island.

If you know what you’re doing, you can simplify

seriesOfBools = series == "Chinese"
chineses = series[seriesOfBools]

chineses = series[series == "Chinese"]

How many Italian restaurants does Manhattan have? The last time I counted (excluding pizza), there were about 9,000.

How many boroughs have restaurants?
How many cuisines does New York City have?
Index.is_unique and Series.is_unique
Index.unique and Series.unique

Index.unique returns a pd.Index; Series.unique returns an np.ndarray. Series.unique can’t create and return a new Series, because it has no index to give to the new Series.

"""
How many boroughs have restaurants?
How many cuisines does New York City have?
"""

import sys
import numpy as np
import pandas as pd

#Website is
#https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j

url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv"
df = pd.read_csv(url).groupby("CAMIS").first()   #Keep only one row (the first) for each restaurant.
index = pd.Index(data = df["BORO"], name = "Borough")
series = pd.Series(data = df["CUISINE DESCRIPTION"].array, index = index, name = "Cuisine")

valid = [
    "Bronx",
    "Brooklyn",
    "Manhattan",
    "Queens",
    "Staten Island"
]
series = series[valid]

index = series.index
if not index.is_unique:
    index = index.unique()
for i, borough in enumerate(index.sort_values(), start = 1):
    print(i, borough)
print()

seriesOfCuisines = series
if not seriesOfCuisines.is_unique:
    seriesOfCuisines = pd.Series(data = seriesOfCuisines.unique())
for i, cuisine in enumerate(seriesOfCuisines.sort_values(), start = 1):
    print(f"{i:2} {cuisine}")

sys.exit(0)

1 Bronx
2 Brooklyn
3 Manhattan
4 Queens
5 Staten Island

 1 Afghan
 2 African
 3 American
 4 Armenian
 5 Asian
 6 Australian
 7 Bagels/Pretzels
 8 Bakery
 9 Bangladeshi
10 Barbecue
11 Basque
12 Bottled beverages, including water, sodas, juices, etc.
13 Brazilian
14 Café/Coffee/Tea
15 Cajun
16 Californian
17 Caribbean
18 Chicken
19 Chilean
20 Chinese
21 Chinese/Cuban
22 Chinese/Japanese
23 Continental
24 Creole
25 Creole/Cajun
26 Czech
27 Delicatessen
28 Donuts
29 Eastern European
30 Egyptian
31 English
32 Ethiopian
33 Filipino
34 French
35 Fruits/Vegetables
36 German
37 Greek
38 Hamburgers
39 Hawaiian
40 Hotdogs
41 Hotdogs/Pretzels
42 Ice Cream, Gelato, Yogurt, Ices
43 Indian
44 Indonesian
45 Iranian
46 Irish
47 Italian
48 Japanese
49 Jewish/Kosher
50 Juice, Smoothies, Fruit Salads
51 Korean
52 Latin (Cuban, Dominican, Puerto Rican, South & Central American)
53 Mediterranean
54 Mexican
55 Middle Eastern
56 Moroccan
57 Not Listed/Not Applicable
58 Nuts/Confectionary
59 Other
60 Pakistani
61 Pancakes/Waffles
62 Peruvian
63 Pizza
64 Pizza/Italian
65 Polish
66 Portuguese
67 Russian
68 Salads
69 Sandwiches
70 Sandwiches/Salads/Mixed Buffet
71 Scandinavian
72 Seafood
73 Soul Food
74 Soups
75 Soups & Sandwiches
76 Southwestern
77 Spanish
78 Steak
79 Tapas
80 Tex-Mex
81 Thai
82 Turkish
83 Vegetarian
84 Vietnamese/Cambodian/Malaysia

A simple example of Series.value_counts

The file christmas.txt contains 1,000 random lowercase letters, one per line. I read them into a Series named seriesOfLetters and printed the first five.

import sys
import pandas as pd

url = "http://oit2.scps.nyu.edu/~meretzkm/pandas/series/christmas.txt"
df = pd.read_csv(url, header = None) #Create a pd.DataFrame containing one column numbered 0
seriesOfLetters = df[0]                       #Get that column.
seriesOfLetters.name = "Random Letters"

print(seriesOfLetters.head())
print()

seriesOfNumbers = seriesOfLetters.value_counts()
seriesOfNumbers.index.name = "letter"
seriesOfNumbers.name = "Frequencies"
print(seriesOfNumbers)
print()

print(seriesOfNumbers.sort_index())   #alphabetical order
print()

print(f"{len(seriesOfLetters)  = :,}")
print(f"{seriesOfNumbers.sum() = :,}")
sys.exit(0)

0    x
1    v
2    o
3    g
4    g
Name: Random Letters, dtype: object

letter
w    51
x    51
e    50
b    50
n    48
c    46
m    45
v    44
k    42
d    42
j    41
p    40
q    40
t    38
u    37
s    37
y    36
r    35
o    35
g    34
z    34
f    32
a    32
h    31
i    29
Name: Frequencies, dtype: int64

letter
a    32
b    50
c    46
d    42
e    50
f    32
g    34
h    31
i    29
j    41
k    42
m    45
n    48
o    35
p    40
q    40
r    35
s    37
t    38
u    37
v    44
w    51
x    51
y    36
z    34
Name: Frequencies, dtype: int64

len(seriesOfLetters)  = 1,000
seriesOfNumbers.sum() = 1,000

The frequencies of the letters contain a Christmas message of peace and goodwill for all mankind, or humankind. Can you figure out the message?

#Hint 1.

print(f"{len(seriesOfNumbers)  = :,}")

#Hint 2.

data = list(string.ascii_lowercase)                           #remember to import string
index = pd.Index(data = data, name = "every letter")
print(seriesOfNumbers.reindex(index = index, fill_value = 0)) #or fill_value = 1_000_000

What’s the most popular cuisine in Staten Island?
What borough has the most Chinese restaurants?
Index.value_counts and Series.value_counts
Series.idxmax, Series.max, np.argmax

"""
What's the most popular cuisine in Staten Island?
What borough has the most Chinese restaurants?
"""

import sys
import numpy as np
import pandas as pd

#Website is
#https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j

url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv"
df = pd.read_csv(url).groupby("CAMIS").first()   #Keep only one row (the first) for each restaurant.
index = pd.Index(data = df["BORO"], name = "Borough")
series = pd.Series(data = df["CUISINE DESCRIPTION"].array, index = index, name = "Cuisine")

valid = [
    "Bronx",
    "Brooklyn",
    "Manhattan",
    "Queens",
    "Staten Island"
]
series = series[valid]

statens = series["Staten Island"]
if len(statens) == 0:
    print("Staten Island has no restaurants.")
else:
    seriesOfCounts = statens.value_counts()
    cuisine = seriesOfCounts.idxmax()
    print(f"The most popular cuisine in Staten Island is {cuisine.upper()}.")
    print(f"{cuisine} is number {np.argmax(seriesOfCounts.array) + 1} on the list!")
    print(f"There are {seriesOfCounts.max()} of them in Staten Island.")
print()

chineses = series[series == "Chinese"]
if len(chineses) == 0:
    print("New York City has no Chinese restaurants.")
else:
    seriesOfCounts = chineses.index.value_counts()
    borough = seriesOfCounts.idxmax()
    print(f"The borough with the most Chinese restaurants is {borough.upper()}.")
    print(f"{borough} is number {np.argmax(seriesOfCounts.array) + 1} on the list!")
    print(f"There are {seriesOfCounts.max()} of them in {borough}.")

sys.exit(0)

The most popular cuisine in Staten Island is AMERICAN.
American is number 1 on the list!
There are 219 of them in Staten Island.

The borough with the most Chinese restaurants is QUEENS.
Queens is number 1 on the list!
There are 781 of them in Queens.

Tally all the cuisines in the city with value_counts and sort_index

import sys
import pandas as pd

#Website is
#https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j

url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv"
df = pd.read_csv(url).groupby("CAMIS").first()   #Keep only one row (the first) for each restaurant.
index = pd.Index(data = df["BORO"], name = "Borough")
series = pd.Series(data = df["CUISINE DESCRIPTION"].array, index = index, name = "Cuisine")

valid = [
    "Bronx",
    "Brooklyn",
    "Manhattan",
    "Queens",
    "Staten Island"
]
series = series[valid]

print("Boroughs in decreasing order:")
seriesOfInts = series.index.value_counts(ascending = False) #ascending = False is the default
print(seriesOfInts)                  #seriesOfInts is a pd.Series
print()

print("Boroughs in alphabetical order:")
print(seriesOfInts.sort_index(ascending = True))            #ascending = True is the default
print()

print("Cuisines in decreasing order:")
seriesOfInts = series.value_counts() #seriesOfInts is a pd.Series
s = seriesOfInts.to_string(max_rows = None)
print(s)
print()

print("Cuisines in alphabetical order:")
seriesOfInts.sort_index(inplace = True)
s = seriesOfInts.to_string(max_rows = None)
print(s)

sys.exit(0)

Boroughs in decreasing order:
Manhattan        10859
Brooklyn          6788
Queens            6158
Bronx             2397
Staten Island      983
Name: Borough, dtype: int64

Boroughs in alphabetical order:
Bronx             2397
Brooklyn          6788
Manhattan        10859
Queens            6158
Staten Island      983
Name: Borough, dtype: int64

Cuisines in decreasing order:
American                                                            5759
Chinese                                                             2396
Other                                                               1826
Café/Coffee/Tea                                                     1808
Pizza                                                               1198
Mexican                                                              967
Italian                                                              960
Japanese                                                             872
Latin (Cuban, Dominican, Puerto Rican, South & Central American)     823
Bakery                                                               751
Caribbean                                                            703
Spanish                                                              636
Donuts                                                               519
Chicken                                                              511
Pizza/Italian                                                        473
Juice, Smoothies, Fruit Salads                                       422
Asian                                                                413
Hamburgers                                                           365
Sandwiches                                                           352
Ice Cream, Gelato, Yogurt, Ices                                      345
Jewish/Kosher                                                        331
Indian                                                               316
French                                                               312
Korean                                                               307
Thai                                                                 305
Delicatessen                                                         297
Mediterranean                                                        281
Sandwiches/Salads/Mixed Buffet                                       233
Seafood                                                              193
Middle Eastern                                                       192
Irish                                                                191
Bagels/Pretzels                                                      157
Greek                                                                138
Vegetarian                                                           130
Tex-Mex                                                              126
Bottled beverages, including water, sodas, juices, etc.              109
Vietnamese/Cambodian/Malaysia                                        103
Peruvian                                                              83
Steak                                                                 79
Salads                                                                78
African                                                               73
Turkish                                                               72
Eastern European                                                      70
Russian                                                               69
Soul Food                                                             66
Barbecue                                                              54
Hawaiian                                                              47
Continental                                                           47
Soups & Sandwiches                                                    43
Chinese/Japanese                                                      43
Bangladeshi                                                           43
Tapas                                                                 41
Hotdogs/Pretzels                                                      32
Filipino                                                              31
Brazilian                                                             29
Pakistani                                                             29
Hotdogs                                                               28
German                                                                28
Armenian                                                              26
Polish                                                                25
Australian                                                            22
Creole                                                                22
Chinese/Cuban                                                         21
Ethiopian                                                             17
Pancakes/Waffles                                                      15
Afghan                                                                14
English                                                               13
Egyptian                                                              12
Portuguese                                                            11
Indonesian                                                            10
Moroccan                                                               8
Soups                                                                  8
Not Listed/Not Applicable                                              8
Scandinavian                                                           7
Creole/Cajun                                                           6
Fruits/Vegetables                                                      6
Cajun                                                                  6
Nuts/Confectionary                                                     6
Czech                                                                  4
Californian                                                            4
Iranian                                                                4
Southwestern                                                           3
Chilean                                                                1
Basque                                                                 1

Cuisines in alphabetical order:
Afghan                                                                14
African                                                               73
American                                                            5759
Armenian                                                              26
Asian                                                                413
Australian                                                            22
Bagels/Pretzels                                                      157
Bakery                                                               751
Bangladeshi                                                           43
Barbecue                                                              54
Basque                                                                 1
Bottled beverages, including water, sodas, juices, etc.              109
Brazilian                                                             29
Café/Coffee/Tea                                                     1808
Cajun                                                                  6
Californian                                                            4
Caribbean                                                            703
Chicken                                                              511
Chilean                                                                1
Chinese                                                             2396
Chinese/Cuban                                                         21
Chinese/Japanese                                                      43
Continental                                                           47
Creole                                                                22
Creole/Cajun                                                           6
Czech                                                                  4
Delicatessen                                                         297
Donuts                                                               519
Eastern European                                                      70
Egyptian                                                              12
English                                                               13
Ethiopian                                                             17
Filipino                                                              31
French                                                               312
Fruits/Vegetables                                                      6
German                                                                28
Greek                                                                138
Hamburgers                                                           365
Hawaiian                                                              47
Hotdogs                                                               28
Hotdogs/Pretzels                                                      32
Ice Cream, Gelato, Yogurt, Ices                                      345
Indian                                                               316
Indonesian                                                            10
Iranian                                                                4
Irish                                                                191
Italian                                                              960
Japanese                                                             872
Jewish/Kosher                                                        331
Juice, Smoothies, Fruit Salads                                       422
Korean                                                               307
Latin (Cuban, Dominican, Puerto Rican, South & Central American)     823
Mediterranean                                                        281
Mexican                                                              967
Middle Eastern                                                       192
Moroccan                                                               8
Not Listed/Not Applicable                                              8
Nuts/Confectionary                                                     6
Other                                                               1826
Pakistani                                                             29
Pancakes/Waffles                                                      15
Peruvian                                                              83
Pizza                                                               1198
Pizza/Italian                                                        473
Polish                                                                25
Portuguese                                                            11
Russian                                                               69
Salads                                                                78
Sandwiches                                                           352
Sandwiches/Salads/Mixed Buffet                                       233
Scandinavian                                                           7
Seafood                                                              193
Soul Food                                                             66
Soups                                                                  8
Soups & Sandwiches                                                    43
Southwestern                                                           3
Spanish                                                              636
Steak                                                                 79
Tapas                                                                 41
Tex-Mex                                                              126
Thai                                                                 305
Turkish                                                               72
Vegetarian                                                           130
Vietnamese/Cambodian/Malaysia                                        103

What is the borough-by-borough distribution of Cajun food?
Index.value_counts, Index.unique, and Series.reindex.

"Which boroughs have Cajun restaurants?"

import sys
import pandas as pd

#Website is
#https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j

url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv"
df = pd.read_csv(url).groupby("CAMIS").first()   #Keep only one row (the first) for each restaurant.
index = pd.Index(data = df["BORO"], name = "Borough")
series = pd.Series(data = df["CUISINE DESCRIPTION"].array, index = index, name = "Cuisine")

valid = [
    "Bronx",
    "Brooklyn",
    "Manhattan",
    "Queens",
    "Staten Island"
]
series = series[valid]

seriesOfBools = series == "Cajun"          #Create a Series containing about 27,000 bools. 6 of them are True.
cajuns = series[seriesOfBools]             #Create a Series containing 6 strings.
print(cajuns)
print()

seriesOfInts = cajuns.index.value_counts() #Create a Series containing only 3 ints.
print(seriesOfInts)
print()

newIndex = series.index.unique()
seriesOfInts = seriesOfInts.reindex(newIndex, fill_value = 0) #Create a Series containing 5 ints.
print(seriesOfInts)
print()

seriesOfBools = seriesOfInts > 0           #Create a Series containing 5 bools.
print(seriesOfBools)

sys.exit(0)

Borough
Staten Island    Cajun
Queens           Cajun
Brooklyn         Cajun
Brooklyn         Cajun
Brooklyn         Cajun
Brooklyn         Cajun
Name: Cuisine, dtype: object

Brooklyn         4
Queens           1
Staten Island    1
Name: Borough, dtype: int64

Borough
Bronx            0
Brooklyn         4
Manhattan        0
Queens           1
Staten Island    1
Name: Borough, dtype: int64

Borough
Bronx            False
Brooklyn          True
Manhattan        False
Queens            True
Staten Island     True
Name: Borough, dtype: bool

Create five smaller Serieses
Series.groupby

In the following program, the first call to the groupby method splits the series into five smaller Serieses called groups. That’s because series.index contains five distinct values ("Bronx", "Brooklyn", "Manhattan", etc). Of course, series.index contains many copies of each of these five values.

The second call to groupby splits the series into 84 groups.

import sys
import pandas as pd

#Website is
#https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j

url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv"
df = pd.read_csv(url).groupby("CAMIS").first()   #Keep only one row (the first) for each restaurant.
index = pd.Index(data = df["BORO"], name = "Borough")
series = pd.Series(data = df["CUISINE DESCRIPTION"].array, index = index, name = "Cuisine")

valid = [
    "Bronx",
    "Brooklyn",
    "Manhattan",
    "Queens",
    "Staten Island"
]
series = series[valid]

#Group by borough in alphabetical order.
groups = series.groupby(series.index)

for name, group in groups:   #Each name is a str, each group is a Series.
    print(f"{len(group):6,} {name}")
print()

#Group by cuisine in alphabetical order.
groups = series.groupby(series)

for name, group in groups:
    print(f"{len(group):5,} {name}")

sys.exit(0)

To group by borough in increasing numeric order, change

for name, group in groups:   #Each name is a str, each group is a Series.

                                                                #Each name is a str, each group is a Series.
for name, group in sorted(groups, key = lambda t: len(t[1])):   #Each t is a tuple containing 2 items.

in the above program.

 2,392 Bronx
 6,771 Brooklyn
10,837 Manhattan
 6,150 Queens
   982 Staten Island

   14 Afghan
   73 African
5,739 American
   26 Armenian
  408 Asian
   22 Australian
  157 Bagels/Pretzels
  750 Bakery
   43 Bangladeshi
   54 Barbecue
    1 Basque
  109 Bottled beverages, including water, sodas, juices, etc.
   29 Brazilian
1,807 Café/Coffee/Tea
    6 Cajun
    4 Californian
  703 Caribbean
  508 Chicken
    1 Chilean
2,390 Chinese
   21 Chinese/Cuban
   43 Chinese/Japanese
   47 Continental
   22 Creole
    6 Creole/Cajun
    4 Czech
  297 Delicatessen
  518 Donuts
   70 Eastern European
   12 Egyptian
   13 English
   17 Ethiopian
   31 Filipino
  311 French
    6 Fruits/Vegetables
   28 German
  138 Greek
  365 Hamburgers
   47 Hawaiian
   28 Hotdogs
   32 Hotdogs/Pretzels
  345 Ice Cream, Gelato, Yogurt, Ices
  316 Indian
   10 Indonesian
    4 Iranian
  191 Irish
  959 Italian
  873 Japanese
  331 Jewish/Kosher
  422 Juice, Smoothies, Fruit Salads
  306 Korean
  822 Latin (Cuban, Dominican, Puerto Rican, South & Central American)
  281 Mediterranean
  967 Mexican
  191 Middle Eastern
    8 Moroccan
    8 Not Listed/Not Applicable
    6 Nuts/Confectionary
1,818 Other
   29 Pakistani
   15 Pancakes/Waffles
   83 Peruvian
1,197 Pizza
  473 Pizza/Italian
   25 Polish
   11 Portuguese
   69 Russian
   78 Salads
  352 Sandwiches
  233 Sandwiches/Salads/Mixed Buffet
    7 Scandinavian
  193 Seafood
   66 Soul Food
    8 Soups
   43 Soups & Sandwiches
    3 Southwestern
  636 Spanish
   79 Steak
   41 Tapas
  126 Tex-Mex
  304 Thai
   71 Turkish
  130 Vegetarian
  102 Vietnamese/Cambodian/Malaysia

The above program merely prints the len and name of each group. The following program performs a more elaborate computation for each group.

The program does not bother to create a Series of all the Chinese restaurants in New York City. It merely computes the number of rows (n2) that such a Series would have.

"Chinese restaurants by borough."

import sys
import pandas as pd

#Website is
#https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j

url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv"
df = pd.read_csv(url).groupby("CAMIS").first()   #Keep only one row (the first) for each restaurant.
index = pd.Index(data = df["BORO"], name = "Borough")
series = pd.Series(data = df["CUISINE DESCRIPTION"].array, index = index, name = "Cuisine")

valid = [
    "Bronx",
    "Brooklyn",
    "Manhattan",
    "Queens",
    "Staten Island"
]
series = series[valid]

seriesOfBools = series == "Chinese"
n2 = seriesOfBools.sum()           #number of Chinese restaurants in New York City

#Group by borough in alphabetical order.
groups = series.groupby(series.index)

for name, group in groups:         #Each name is a str, each group is a Series.
    m = (group == "Chinese").sum() #number of Chinese restaurants in this borough
    n1 = len(group)
    percent1 = 100 * m / n1
    percent2 = 100 * m / n2
    the = "the " if name == "Bronx" else ""
    print(f"{m} of the restaurants in {the}{name} are Chinese.")
    print(f"That's {percent1:.1f}% of the {len(group):,} restaurants in {the}{name},",)
    print(f"and {percent2:.1f}% of the {n2:,} Chinese restaurants in New York City.")
    print()

sys.exit(0)

295 of the restaurants in Bronx are Chinese.
That's 12.3% of the 2,397 restaurants in the Bronx,
and 12.3% of the 2,396 Chinese restaurants in New York City.

731 of the restaurants in Brooklyn are Chinese.
That's 10.8% of the 6,788 restaurants in Brooklyn,
and 30.5% of the 2,396 Chinese restaurants in New York City.

511 of the restaurants in Manhattan are Chinese.
That's 4.7% of the 10,859 restaurants in Manhattan,
and 21.3% of the 2,396 Chinese restaurants in New York City.

781 of the restaurants in Queens are Chinese.
That's 12.7% of the 6,158 restaurants in Queens,
and 32.6% of the 2,396 Chinese restaurants in New York City.

78 of the restaurants in Staten Island are Chinese.
That's 7.9% of the 983 restaurants in Staten Island,
and 3.3% of the 2,396 Chinese restaurants in New York City.

The split-apply-combine paradigm
GroupBy.agg (a.k.a. GroupBy.aggregate)

A function passed to agg is called the aggregating function. In this example, the two aggregating functions simple enough to be lambda functions.

As in the above programs, the Series.groupby method splits the Series into smaller Serieses called groups. The aggregating function is then applied to each group. In other words, agg passes each group to the aggregating function. In this program, the first aggregating function is called five times and the second aggregating function is called 84 times.

In this program, the argument of the aggregating function is a group. (Later, the argument might be a DataFrame.) The return value of the aggregating function must be a single value, not a Series. In this program, each aggregating function returns an integer.

The agg function combines the return value of each call to the aggregating function into a brand new Series and returns this Series.

"""
How many different cuisines does each borough offer?
How many boroughs offer each cuisine?
"""

import sys
import pandas as pd

#Website is
#https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j

url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv"
df = pd.read_csv(url).groupby("CAMIS").first()   #Keep only one row (the first) for each restaurant.
index = pd.Index(data = df["BORO"], name = "Borough")
series = pd.Series(data = df["CUISINE DESCRIPTION"].array, index = index, name = "Cuisine")

valid = [
    "Bronx",
    "Brooklyn",
    "Manhattan",
    "Queens",
    "Staten Island"
]
series = series[valid]

print("How many cuisines does each borough offer?")
groups = series.groupby(series.index)   #Group by borough in alphabetical order.
seriesOfNumbers = groups.agg(lambda group: len(group.unique())) #Each group is a Series.
seriesOfNumbers.name = "How Many Cusisines"
print(seriesOfNumbers)
print()

print("How many boroughs offer each cuisine?")
groups = series.groupby(series)   #Group by cuisine in alphabetical order.
seriesOfNumbers = groups.agg(lambda group: len(group.index.unique())) #Each group is a Series.
seriesOfNumbers.name = "How Many Boroughs"
print(seriesOfNumbers)
sys.exit(0)

How many cuisines does each borough offer?
Borough
Bronx            53
Brooklyn         79
Manhattan        80
Queens           78
Staten Island    54
Name: How Many Cusisines, dtype: int64

How many boroughs offer each cuisine?
Cuisine
Afghan                           3
African                          5
American                         5
Armenian                         4
Asian                            5
                                ..
Tex-Mex                          5
Thai                             5
Turkish                          4
Vegetarian                       4
Vietnamese/Cambodian/Malaysia    5
Name: How Many Boroughs, Length: 84, dtype: int64

Split-apply-combine with GroupBy.apply

Like GroupBy.agg, GroupBy.apply receives an aggregating function. But the aggregating function passed to GroupBy.apply does not have to return a single value. It can return an entire Series.

Until now, the index of a Series has been an a single column. In other words, the index of each row has been a single value such as 0 or 1. But in the following program, the index of the first seriesOfNumbers is two columns named "Borough" and "Cuisine". In other words, the index of each row is now a tuple of two items. For example, the index of the first row is the tuple ("Bronx", "American") and the index of the second row is the tuple ("Bronx", "Chinese"). (The first item in a tuple is not printed when it is the same as the first item of the tuple in the previous row.)

This type of index is called a hierarchical index or MultiIndex. See Hierarchical indexing. In the following program, the first seriesOfNumbers has a MultiIndex with two levels. The level on the left ("Borough") is called the outer level, and the level on the right ("Cuisine") is called the inner level. The levels are numbered like the items in a Python list: the outer level is number 0, the next level is number 1, and a more complicated example might have a level 2. The innermost level (number 1 in this example) is also number -1.

import sys
import pandas as pd

#Website is
#https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j

url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv"
df = pd.read_csv(url).groupby("CAMIS").first()   #Keep only one row (the first) for each restaurant.
index = pd.Index(data = df["BORO"], name = "Borough")
series = pd.Series(data = df["CUISINE DESCRIPTION"].array, index = index, name = "Cuisine")

valid = [
    "Bronx",
    "Brooklyn",
    "Manhattan",
    "Queens",
    "Staten Island"
]
series = series[valid]

print("What are the three most popular cuisines in each borough?")
groups = series.groupby(series.index)   #Group by borough in alphabetical order.
seriesOfNumbers = groups.apply(lambda group: group.value_counts()[:3]) #Each group is a Series.
seriesOfNumbers.index.names = ["Borough", "Cuisine"]
seriesOfNumbers.name = "Most popular cuisines in each borough"
print(seriesOfNumbers)
print()

print(seriesOfNumbers.index)
print()

print("What three boroughs have the most restaurants of each cuisine?")
groups = series.groupby(series)   #Group by cuisine in alphabetical order.
seriesOfNumbers = groups.apply(lambda group: group.index.value_counts()[:3]) #Each group is a Series.
seriesOfNumbers.index.names = ["Cuisine", "Borough"]
seriesOfNumbers.name = "Boroughs with the most restaurants of each cuisine"
s = seriesOfNumbers.to_string(dtype = True, length = True, max_rows = 20, name = True)
print(s)
print()

print(seriesOfNumbers.index)
sys.exit(0)

What are the 3 most popular cuisines in each borough?
Borough        Cuisine
Bronx          American            402
               Chinese             295
               Pizza               203
Brooklyn       American           1247
               Chinese             731
               Café/Coffee/Tea     459
Manhattan      American           2911
               Café/Coffee/Tea     984
               Other               818
Queens         American            980
               Chinese             781
               Other               388
Staten Island  American            219
               Chinese              78
               Italian              70
Name: Most popular cuisines in each borough, dtype: int64

MultiIndex([(        'Bronx',        'American'),
            (        'Bronx',         'Chinese'),
            (        'Bronx',           'Pizza'),
            (     'Brooklyn',        'American'),
            (     'Brooklyn',         'Chinese'),
            (     'Brooklyn', 'Café/Coffee/Tea'),
            (    'Manhattan',        'American'),
            (    'Manhattan', 'Café/Coffee/Tea'),
            (    'Manhattan',           'Other'),
            (       'Queens',        'American'),
            (       'Queens',         'Chinese'),
            (       'Queens',           'Other'),
            ('Staten Island',        'American'),
            ('Staten Island',         'Chinese'),
            ('Staten Island',         'Italian')],
           names=['Borough', 'Cuisine'])

What 3 boroughs have the most restaurants of each cuisine?
Cuisine                        Borough
Afghan                         Queens          8
                               Manhattan       4
                               Brooklyn        2
African                        Bronx          25
                               Brooklyn       23
                               Manhattan      20
American                       Manhattan    2911
                               Brooklyn     1247
                               Queens        980
Armenian                       Manhattan      14
                                            ...
Thai                           Queens         78
Turkish                        Brooklyn       30
                               Manhattan      29
                               Queens         10
Vegetarian                     Manhattan      70
                               Brooklyn       47
                               Queens         12
Vietnamese/Cambodian/Malaysia  Manhattan      42
                               Brooklyn       36
                               Queens         21
Name: Boroughs with the most restaurants of each cuisine, Length: 239, dtype: int64

MultiIndex([(                       'Afghan',    'Queens'),
            (                       'Afghan', 'Manhattan'),
            (                       'Afghan',  'Brooklyn'),
            (                      'African',     'Bronx'),
            (                      'African',  'Brooklyn'),
            (                      'African', 'Manhattan'),
            (                     'American', 'Manhattan'),
            (                     'American',  'Brooklyn'),
            (                     'American',    'Queens'),
            (                     'Armenian', 'Manhattan'),
            ...
            (                         'Thai',    'Queens'),
            (                      'Turkish',  'Brooklyn'),
            (                      'Turkish', 'Manhattan'),
            (                      'Turkish',    'Queens'),
            (                   'Vegetarian', 'Manhattan'),
            (                   'Vegetarian',  'Brooklyn'),
            (                   'Vegetarian',    'Queens'),
            ('Vietnamese/Cambodian/Malaysia', 'Manhattan'),
            ('Vietnamese/Cambodian/Malaysia',  'Brooklyn'),
            ('Vietnamese/Cambodian/Malaysia',    'Queens')],
           names=['Cuisine', 'Borough'], length=239)

A cross tabulation of two factors is a two-dimensional table.
Series.unstack

The following call to Series.unstack created a DataFrame containg five columns. The index of the DataFrame was created by removing the innermost level (number -1) from the index of the Series. (The index of the Series was a hierarchical index with two levels; the index of the DataFrame is just a plain old index with only one level.)

The DataFrame contains five columns because the level we removed from the index contains five possible values ("Bronx", "Brooklyn", "Manhattan", etc.)

import sys
import pandas as pd

#Website is
#https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j

url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv"
df = pd.read_csv(url).groupby("CAMIS").first()   #Keep only one row (the first) for each restaurant.
index = pd.Index(data = df["BORO"], name = "Borough")
series = pd.Series(data = df["CUISINE DESCRIPTION"].array, index = index, name = "Cuisine")

valid = [
    "Bronx",
    "Brooklyn",
    "Manhattan",
    "Queens",
    "Staten Island"
]
series = series[valid]

groups = series.groupby(series)   #Group by cuisine in alphabetical order.
seriesOfNumbers = groups.apply(lambda group: group.index.value_counts()) #Each group is a Series.
seriesOfNumbers.index.names = ["Cuisine", "Borough"]
seriesOfNumbers.name = "Number of Restaurants"
s = seriesOfNumbers.to_string(dtype = True, length = True, max_rows = 20, name = True)
print(s)
print()

df = seriesOfNumbers.unstack(fill_value = 0, level = -1) #level = -1 is the default
df.name = "Cross Tabulation"
s = df.to_string(max_cols = 5, max_rows = 10, show_dimensions = True)
print(s)
sys.exit(0)

Cuisine                        Borough
Afghan                         Queens              8
                               Manhattan           4
                               Brooklyn            2
African                        Bronx              25
                               Brooklyn           23
                               Manhattan          20
                               Queens              4
                               Staten Island       1
American                       Manhattan        2911
                               Brooklyn         1247
                                                ...
Turkish                        Staten Island       3
Vegetarian                     Manhattan          70
                               Brooklyn           47
                               Queens             12
                               Bronx               1
Vietnamese/Cambodian/Malaysia  Manhattan          42
                               Brooklyn           36
                               Queens             21
                               Staten Island       3
                               Bronx               1
Name: Number of Restaurants, Length: 344, dtype: int64

Borough                        Bronx  Brooklyn  Manhattan  Queens  Staten Island
Cuisine
Afghan                             0         2          4       8              0
African                           25        23         20       4              1
American                         402      1247       2911     980            219
Armenian                           1         7         14       4              0
Asian                              6        98        175     118             15
...                              ...       ...        ...     ...            ...
Tex-Mex                           10        32         50      28              6
Thai                               5        81        138      78              3
Turkish                            0        30         29      10              3
Vegetarian                         1        47         70      12              0
Vietnamese/Cambodian/Malaysia      1        36         42      21              3

[84 rows x 5 columns]

pd.cross_tab

pd.cross_tab is a shortcut for the above. The following call to this function creates a pd.DataFrame containing five columns because the index of the Series contains copies of five strings ("Broooklyn", "Bronx", "Manhattan", etc). The pd.DataFrame contains 84 rows because the values of the Series contains copies of 84 strings ("Afghan", "African", "American", etc). There’s one extra row and one extra column (for totals of 6 and 85) because we asked for margins.

import sys
import pandas as pd

#Website is
#https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j

url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv"
df = pd.read_csv(url).groupby("CAMIS").first()   #Keep only one row (the first) for each restaurant.
index = pd.Index(data = df["BORO"], name = "Borough")
series = pd.Series(data = df["CUISINE DESCRIPTION"].array, index = index, name = "Cuisine")

valid = [
    "Bronx",
    "Brooklyn",
    "Manhattan",
    "Queens",
    "Staten Island"
]
series = series[valid]

#Create a pd.DataFrame containing 5+1 columns, including the All column.
df = pd.crosstab(index = series.array, columns = series.index, margins = True)
df.index.name = "Cuisine"
df.columns.name = "Boroughs"

s = df.to_string(max_cols = 5 + 1, show_dimensions = True)
print(s)
sys.exit(0)

Boroughs                                            Bronx  Brooklyn  Manhattan  Queens  Staten Island    All
Cuisine
Afghan                                                  0         2          4       8              0     14
African                                                25        23         20       4              1     73
American                                              402      1247       2911     980            219   5759
Armenian                                                1         7         14       4              0     26
Asian                                                   6        98        175     118             15    412
Australian                                              0         4         17       1              0     22
Bagels/Pretzels                                         7        39         62      39             10    157
Bakery                                                 75       190        241     223             22    751
Bangladeshi                                             8         9          6      20              0     43
Barbecue                                                3        16         24       6              5     54
Basque                                                  0         0          1       0              0      1
Bottled beverages, including water, sodas, juic...      9        23         39      34              4    109
Brazilian                                               0         5         13      11              0     29
Café/Coffee/Tea                                        63       459        984     275             28   1809
Cajun                                                   0         4          0       1              1      6
Californian                                             0         1          3       0              0      4
Caribbean                                             118       337         38     205              5    703
Chicken                                               115       157        113     109             17    511
Chilean                                                 0         0          0       1              0      1
Chinese                                               295       731        511     781             78   2396
Chinese/Cuban                                           3         3         11       4              0     21
Chinese/Japanese                                        0        13         19       9              2     43
Continental                                             0        10         25      11              1     47
Creole                                                  0        15          2       5              0     22
Creole/Cajun                                            0         4          0       2              0      6
Czech                                                   1         1          0       2              0      4
Delicatessen                                           28        51        112      93             13    297
Donuts                                                 66       122        141     155             35    519
Eastern European                                        3        34         17      13              3     70
Egyptian                                                0         4          5       3              0     12
English                                                 0         1         11       1              0     13
Ethiopian                                               0         4         13       0              0     17
Filipino                                                0         1          8      20              2     31
French                                                  2        51        244      15              0    312
Fruits/Vegetables                                       0         1          3       1              1      6
German                                                  0         6         12       7              3     28
Greek                                                   6        18         55      53              6    138
Hamburgers                                             63        86         92     101             23    365
Hawaiian                                                1         8         32       6              0     47
Hotdogs                                                 1         7         18       2              0     28
Hotdogs/Pretzels                                        3         4         17       5              3     32
Ice Cream, Gelato, Yogurt, Ices                        23       102        114      74             32    345
Indian                                                  7        59        138     103              9    316
Indonesian                                              0         3          3       4              0     10
Iranian                                                 0         0          2       2              0      4
Irish                                                  10        12        115      52              2    191
Italian                                                42       168        568     112             70    960
Japanese                                               16       187        475     161             34    873
Jewish/Kosher                                           8       184         76      61              2    331
Juice, Smoothies, Fruit Salads                         42       107        174      88             11    422
Korean                                                  0        26        108     173              0    307
Latin (Cuban, Dominican, Puerto Rican, South & ...    174       138        186     314             11    823
Mediterranean                                           5        68        147      52              9    281
Mexican                                               124       288        326     189             40    967
Middle Eastern                                          4        87         57      38              6    192
Moroccan                                                0         0          7       1              0      8
Not Listed/Not Applicable                               1         1          3       2              1      8
Nuts/Confectionary                                      0         4          2       0              0      6
Other                                                 110       449        818     388             62   1827
Pakistani                                               1        13          6       9              0     29
Pancakes/Waffles                                        2         3          3       5              2     15
Peruvian                                                3        13         15      49              3     83
Pizza                                                 203       308        354     276             57   1198
Pizza/Italian                                          57       130        126     105             55    473
Polish                                                  0        17          2       3              3     25
Portuguese                                              0         2          4       5              0     11
Russian                                                 0        52          8       6              3     69
Salads                                                  0         8         65       5              0     78
Sandwiches                                             44        62        154      69             23    352
Sandwiches/Salads/Mixed Buffet                         25        34        111      56              7    233
Scandinavian                                            0         2          5       0              0      7
Seafood                                                26        42         92      29              4    193
Soul Food                                              10        25         14      15              2     66
Soups                                                   0         1          7       0              0      8
Soups & Sandwiches                                      2         7         30       3              1     43
Southwestern                                            0         0          1       2              0      3
Spanish                                               132       146        126     212             20    636
Steak                                                   5         8         54      11              1     79
Tapas                                                   0        10         22       8              1     41
Tex-Mex                                                10        32         50      28              6    126
Thai                                                    5        81        138      78              3    305
Turkish                                                 0        30         29      10              3     72
Vegetarian                                              1        47         70      12              0    130
Vietnamese/Cambodian/Malaysia                           1        36         42      21              3    103
All                                                  2397      6788      10860    6159            983  27187

[85 rows x 6 columns]

Our DataFrame contains 84 rows and 5 columns. What would happen if you reversed the order of the two arguments of pd.cross_tab?

df = pd.crosstab(index = series.index, columns = series.array, margins = True)

A tall pd.Series

Duplicate indices

Series.head and Series.tail

Clean the data. Index.isin

How many Chinese restaurants does Staten Island have?

How many boroughs have restaurants? How many cuisines does New York City have? Index.is_unique and Series.is_unique Index.unique and Series.unique

A simple example of Series.value_counts

What’s the most popular cuisine in Staten Island? What borough has the most Chinese restaurants? Index.value_counts and Series.value_counts Series.idxmax, Series.max, np.argmax