Go to
NYC Open Data.
In the “Search Open Data for things like 311”
search box, type
DOHMH New York City Restaurant Inspection Results
(DOHMH stands for “Department of Health and Mental Hygiene”.)
Click on the first
DOHMH New York City Restaurant Inspection Results
that it finds for you.
Press the blue View Data button.
You should see
the
first page
of records.
Each record consists of 26 fields,
named on the first line in the file.
In Python,
they will be numbered from 0 to 25 inclusive when stored in a
list.
To see a typical restaurant, type
Wo Hop 17
into the “Find in this Dataset” search box on the right side.
Then scroll to the right to the Violation Description column.
To download the file
DOHMH_New_York_City_Restaurant_Inspection_Results.csv
make sure the “Find in this Dataset” box is cleared out.
Press the blue Export button and select “CSV”.
#AWS Linux: cd ~/bin wget https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv ls -l rows.csv wc -l rows.csv
cd ~/Downloads
pwd
/Users/myname/Downloads
ls -l DOHMH_New_York_City_Restaurant_Inspection_Results.csv
-rw-r--r--@ 1 myname mygroup 166877269 Sep 15 17:38 DOHMH_New_York_City_Restaurant_Inspection_Results.csv
wc -l DOHMH_New_York_City_Restaurant_Inspection_Results.csv
391077 DOHMH_New_York_City_Restaurant_Inspection_Results.csv
head -3 DOHMH_New_York_City_Restaurant_Inspection_Results.csv
CAMIS,DBA,BORO,BUILDING,STREET,ZIPCODE,PHONE,CUISINE DESCRIPTION,INSPECTION DATE,ACTION,VIOLATION CODE,VIOLATION DESCRIPTION,CRITICAL FLAG,SCORE,GRADE,GRADE DATE,RECORD DATE,INSPECTION TYPE,Latitude,Longitude,Community Board,Council District,Census Tract,BIN,BBL,NTA
50018145,LA FUENTES STEAKHOUSE,Bronx,1306,JEROME AVE,10452,6467735144,Spanish,01/29/2017,Violations were cited in the following area(s).,10F,"Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.",N,5,,,09/15/2019,Inter-Agency Task Force / Initial Inspection,40.838261018744,-73.919133506998,204,16,022101,2008025,2028390005,BX63
50077581,L'ADRESSE AMERICAN BISTRO,Manhattan,5,BRYANT PARK,10018,2122212510,French,08/08/2019,Violations were cited in the following area(s).,06D,"Food contact surface not properly washed, rinsed and sanitized after each use and following any activity when contamination may have occurred.",Y,12,A,08/08/2019,09/15/2019,Cycle Inspection / Initial Inspection,40.753936992473,-73.985948643127,105,04,011300,1022565,1009930029,MN17
awk -F, 'NR == 1 {print NF}' DOHMH_New_York_City_Restaurant_Inspection_Results.csv
26
Move
DOHMH_New_York_City_Restaurant_Inspection_Results.csv
into the directory that holds your Python programs.
WO HOP 17 05/25/2018 Food not cooked to required minimum temperature. WO HOP 17 05/25/2018 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 06/19/2017 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 05/23/2019 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 06/19/2017 Cold food item held above 41º F (smoked fish and reduced oxygen packaged foods above 38 ºF) except during necessary preparation. WO HOP 17 05/23/2019 Food contact surface not properly maintained.
list
named
hopLines
holding only the Wo Hop records.
Each item in
hopLines
will be a
list
of 26
strings.
Sort the records in the
list
by inspection date.
Change the part of the script after
lines = csv.reader(csvfile)
to the following.
Since the return value of this
score
function is of type string,
sort
will perform string comparisons to compare the scores.
All characters, including digits, have an “alphabetical order”;
see the
ASCII chart.
hopLines = [] #Start with an empty list. for line in lines: #During each iteration, line is a list of 26 strings. if line[0] == "41320866": #CAMIS number for Wo Hop hopLines.append(line) #Append line to hopLines. csvfile.close() def score(line): """ Return the line's datestamp, but with the format changed from "12/31/2019" to "2019/12/31". That makes alphabetical order the same as chronological order. """ fields = line[8].split("/") return f"{fields[2]}/{fields[0]}/{fields[1]}" hopLines.sort(key = score) for line in hopLines: print(line[1], line[8]) #name and inspection date print(line[11]) #violation description print() sys.exit(0)
WO HOP 17 06/19/2017 Cold food item held above 41º F (smoked fish and reduced oxygen packaged foods above 38 ºF) except during necessary preparation. WO HOP 17 06/19/2017 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 05/25/2018 Food not cooked to required minimum temperature. WO HOP 17 05/25/2018 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 05/23/2019 Food contact surface not properly maintained. WO HOP 17 05/23/2019 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.
hopLines = [] #Start with an empty list. for line in lines: #During each iteration, line is a list of 26 strings. if line[0] == "41320866": #CAMIS number for Wo Hop hopLines.append(line) #Append line to hopLines.to the following list comprehension.
hopLines = [line for line in lines if line[0] == "41320866"]
score
function and change the call to
sort
to the following.
Instead of creating and returning a rearranged string such as
"2019/12/31",
the
lambda
function
creates and returns an object of class
datetime.date.
import datetime
#During each call to the lambda function, line is a list of 26 strings. hopLines.sort(key = lambda line: datetime.datetime.strptime(line[8], "%m/%d/%Y").date())
sort
to the following
(and remove the
csvfile.close()).
The entire file is read in as one big
sequence
of bytes.
The
decode
function converts this sequence into the
string
of characters
s,
and
splitlines
converts this
string
into a
list
of
strings.
As we have already seen,
we can use a
csv.reader
to split apart the comma-separated fields each of these
strings.
import sys import csv #Comma-separated values. Do not name this Python script csv.py. import datetime import urllib.request #Database is at #https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv" try: fileFromUrl = urllib.request.urlopen(url) except urllib.error.URLError as error: print(error, file = sys.stderr) sys.exit(1) sequenceOfBytes = fileFromUrl.read() #Slurp whole file into one big sequenceOfBytes. fileFromUrl.close() try: s = sequenceOfBytes.decode("utf-8") #s is a string except UnicodeError as error: print(error, file = sys.stderr) sys.exit(1) lines = csv.reader(s.splitlines()) #lines is a list of lists hopLines = [line for line in lines if line[0] == "41320866"] #each line is a list of 26 strings
WO HOP 17 06/19/2017 Cold food item held above 41º F (smoked fish and reduced oxygen packaged foods above 38 ºF) except during necessary preparation. WO HOP 17 06/19/2017 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 05/25/2018 Food not cooked to required minimum temperature. WO HOP 17 05/25/2018 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 05/23/2019 Food contact surface not properly maintained. WO HOP 17 05/23/2019 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.
decode,
split into fields,
and process them one by one?
Hint:
if the argument of
csv.reader
is a
list
containing one
string,
the
csv.reader
will split that one
string
into fields:
reader = csv.reader([s]) #s is a string
listOfStrings = next(reader) #listOfStrings is a list of the fields of s
PHONE
column, so I had to tell
read_csv
not to always expect a phone number there.
The
DataFrame
df
contains thousands of rows and 26 columns.
The
DataFrame
df.loc[df["CAMIS"] == camis]
contains only 6 lines and 26 columns.
The
DataFrame
df.loc[df["CAMIS"] == camis][columns]
contains 6 lines and 3 columns.
To sort the lines of
df2
in chronological order,
I temporarily added a fourth column named
dates
to
df2,
dropping
it as soon as it was no longer needed.
Each value in the
formatters
dictionary
is a (nameless) function that takes one argument
and returns a
string.
I printed a newline after each inspection date
to start a new line in the output.
I printed a newline after each violation description
to skip an empty line after each inspection.
The violation descriptions were very long,
so I set the maximum column width to infinity.
The output looked neater when I
stripped
the
whitespace
from the start and end of each line of output.
""" Print inspection results for Wo Hop. """ import sys import csv import pandas as pd #https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv" try: df = pd.read_csv(url, dtype = {"PHONE": str}) #df is a pandas DataFrame. except BaseException as error: print(error, file = sys.stderr) sys.exit(1) formatters = { "INSPECTION DATE": lambda id: f"{id}\n", "VIOLATION DESCRIPTION": lambda vd: f"{vd}\n" } camis = 41320866 #doing business as "WO HOP 17" columns = ["DBA", "INSPECTION DATE", "VIOLATION DESCRIPTION"] df2 = df.loc[df["CAMIS"] == camis][columns] #Sort by date, oldest at top. df2["date"] = pd.to_datetime(df2["INSPECTION DATE"], format = "%m/%d/%Y") df2.sort_values(by = "date", inplace = True) df2.drop("date", axis = "columns", inplace = True) pd.set_option("display.max_colwidth", -1) #Prevent ellipsis dots ... s = df2.to_string(formatters = formatters, header = False, index = False) #Strip whitespace from the start and end of each line. lines = [line.strip() for line in s.splitlines()] s = "\n".join(lines) print(s) sys.exit(0)
WO HOP 17 06/19/2017 Cold food item held above 41º F (smoked fish and reduced oxygen packaged foods above 38 ºF) except during necessary preparation. WO HOP 17 06/19/2017 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 05/25/2018 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 05/25/2018 Food not cooked to required minimum temperature. WO HOP 17 05/23/2019 Food contact surface not properly maintained. WO HOP 17 05/23/2019 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.
"""
restaurantinspection.py
This module contains the function getDf.
"""
import sys
import os
import pandas as pd
#https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j
url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv"
filename = "~/Downloads/DOHMH_New_York_City_Restaurant_Inspection_Results.csv"
def getDf(fromWeb = False, verbose = False):
"Return a pandas DataFrame containing the restaurant inspection results."
source = url if fromWeb else os.path.expanduser(filename)
#Fields that should be integers. Int64 can tolerate a NA.
dtype = {
"ZIPCODE": "Int64",
"Community Board": "Int64",
"Council District": "Int64",
"Census Tract": "Int64",
"BIN": "Int64",
"BBL": "Int64"
}
df = pd.read_csv(source, dtype = dtype)
#If BORO == "0" but ZIPCODE is present, change BORO to the correct borough.
zips = [
[10168, "Manhattan"],
[10285, "Manhattan"],
[11249, "Brooklyn"]
]
for zipcode, boro in zips:
mask = (df.BORO == "0") & (df.ZIPCODE == zipcode)
if verbose:
nrows = len(df.loc[mask])
q = '"'
print(f'Changing BORO "0" to {q + boro + q:11} on the {nrows:2} rows containing ZIPCODE {zipcode}.')
df.loc[mask, "BORO"] = boro
if verbose:
print()
seriesOfBools = df.BORO == "0"
if verbose:
print(f'The remaining rows whose BORO is "0" are')
columns = [
"CAMIS",
"DBA",
"BORO",
"BUILDING",
"STREET",
"ZIPCODE"
]
print(df[seriesOfBools].to_string(columns = columns, show_dimensions = True))
print()
#Remove the remaining rows whose BORO == "0".
if verbose:
print(f'Deleting the {sum(df.BORO == "0")} rows whose BORO is still "0".')
print()
df = df[~seriesOfBools]
if verbose:
badPhones = df.loc[~df.PHONE.str.match(r"^\d{10}$").astype(bool)]
print(f"{len(badPhones)} of the {len(df):,} rows have bad phones, including")
print()
s = badPhones.to_string(
columns = ["CAMIS", "DBA", "PHONE", "INSPECTION DATE"],
max_rows = 10,
show_dimensions = True
)
print(s)
print()
return df
if __name__ == "__main__":
print(f"Downloading {url} ...")
df = getDf(fromWeb = True, verbose = True)
columns = ["CAMIS", "DBA", "BORO", "CUISINE DESCRIPTION"]
print(f"Got the following DataFrame (showing {len(columns)} of {len(df.columns)} columns).")
print(df.to_string(columns = columns, max_rows = 10, show_dimensions = True))
sys.exit(0)
"""
Call restaurantinspection.getDf and print the resulting DataFrame.
"""
import sys
import pandas as pd
import restaurantinspection
try:
df = restaurantinspection.getDf(fromWeb = False, verbose = True)
except BaseException as error:
print(error, file = sys.stderr)
sys.exit(1)
columns = [
"CAMIS", #Commercial Activities Management Information System
"DBA", #Doing Business As
"BORO", #Borough
"CUISINE DESCRIPTION",
"INSPECTION DATE"
]
s = df.to_string(
columns = columns,
index = False,
max_rows = 20,
show_dimensions = True
)
print(f"Showing {len(columns)} of the {len(df.columns)} columns:")
print()
print(s)
sys.exit(0)
Changing BORO "0" to "Manhattan" on the 14 rows containing ZIPCODE 10168.
Changing BORO "0" to "Manhattan" on the 5 rows containing ZIPCODE 10285.
Changing BORO "0" to "Brooklyn" on the 81 rows containing ZIPCODE 11249.
The remaining rows whose BORO is "0" are
CAMIS DBA BORO BUILDING STREET ZIPCODE
308 50098157 NaN 0 NaN NaN NaN
36325 50098208 KONDITORI 0 NaN NaN NaN
112576 50098189 AUNT JAKE'S 0 NaN NaN NaN
130863 50098217 NaN 0 NaN NaN NaN
134039 50099742 AMBO 0 NaN NaN NaN
156097 50098161 NaN 0 NaN NaN NaN
184554 50098171 STARBUCKS COFFEE #57154 0 NaN NaN NaN
205375 50098206 BENJYS KOSHER PIZZA 0 NaN NaN NaN
218840 50098177 NaN 0 NaN NaN NaN
219844 50098202 NaN 0 NaN NaN NaN
232978 50097237 NaN 0 NaN NaN NaN
359325 50098200 DAILY PROVISIONS 0 NaN NaN NaN
[12 rows x 6 columns]
Deleting the 12 rows whose BORO is still "0".
472 of the 395,617 rows have bad phones, including
CAMIS DBA PHONE INSPECTION DATE
274 41541587 SEATTLE COFFEE ROASTERS, PIER 79/ NY WATERWAY 917536391_ 11/30/2017
780 50063627 JOE & THE JUICE 917_566_57 07/25/2019
782 50085540 ILLY CAFFE 917407793_ 12/19/2018
1887 50073925 GRACE WOK 917_370_46 02/14/2019
2124 50056461 SAINT ANN'S CAFE RESTAURANT 917_688_16 12/05/2017
... ... ... ... ...
392935 41637438 BOAT HOUSE & CAJUN CUISINE 41637438__ 10/12/2018
394366 50056461 SAINT ANN'S CAFE RESTAURANT 917_688_16 02/01/2019
395250 41541587 SEATTLE COFFEE ROASTERS, PIER 79/ NY WATERWAY 917536391_ 07/01/2016
395445 50056461 SAINT ANN'S CAFE RESTAURANT 917_688_16 08/09/2019
395493 50033071 GABY RESTAURANT __________ 07/27/2017
[472 rows x 4 columns]
Showing 5 of the 26 columns:
CAMIS DBA BORO CUISINE DESCRIPTION INSPECTION DATE
50088180 HAZAR TURKISH KEBAB Brooklyn Turkish 03/26/2019
41654607 BROOKLYN CRAB Brooklyn Seafood 09/14/2017
50058012 HARMONI Manhattan Spanish 09/25/2018
41175577 MARIO'S PIZZERIA Brooklyn Pizza 02/04/2019
50000060 ATLAS STEAKHOUSE Brooklyn Russian 10/23/2019
50063691 THAI TERMINAL Manhattan Thai 05/08/2018
41701903 SHINJUNG GALBI BBQ KOREAN RESTAURANT Queens Korean 02/08/2018
50057790 BAGELS AND SCHMEAR Manhattan Bagels/Pretzels 12/26/2018
50069955 KOTO Brooklyn Japanese 10/11/2017
41457400 VINTRY Manhattan American 11/26/2018
... ... ... ... ...
40843676 PAZZA NOTTE Manhattan Italian 04/09/2018
41697654 HOT JALAPENO RESTAURANT Manhattan Mexican 10/30/2019
50076162 CHINA GARDEN & WING STOP Queens Chinese 09/23/2019
50016943 EL NUEVO ROBLE BILLIARDS Bronx Spanish 08/21/2017
40513217 B66 CLUB Brooklyn Russian 06/17/2016
41433964 WILD GINGER VEGETARIAN KITCHEN Manhattan Vegetarian 09/07/2016
50074107 ZAIQA RESTAURANT Brooklyn Indian 05/01/2019
50073262 BROADWAY PIZZA Brooklyn Pizza 07/25/2019
41335337 LOBSTER BOX Bronx Seafood 04/27/2019
41313395 FIVE NAPKIN BURGER Manhattan American 10/25/2017
[395617 rows x 5 columns]
"""
Print the number of restaurant inspections in each borough.
"""
import sys
import pandas as pd
import restaurantinspection
try:
df = restaurantinspection.getDf(fromWeb = False, verbose = False)
except BaseException as error:
print(error, file = sys.stderr)
sys.exit(1)
print("Number of restaurant inspections in each borough:")
print()
seriesOfInts = df.groupby("BORO").size()
print(seriesOfInts)
print()
print(seriesOfInts.to_string(dtype = False, header = False))
print()
#Convert the seriesOfInts to a DataFrame so we can specify a formatter.
dataFrameOfInts = pd.DataFrame(seriesOfInts) #DataFrame with only one column.
s = dataFrameOfInts.to_string(
formatters = [lambda i: f"{i:,}"], #Print each integer with commas.
header = False, #The name of the one column of this DataFrame is 0. Don't print it.
index_names = False #The name of the index of this DataFrame is BORO. Don't print it.
)
print(s)
sys.exit(0)
Number of restaurant inspections in each borough: BORO Bronx 35948 Brooklyn 100748 Manhattan 155391 Queens 90362 Staten Island 13168 dtype: int64 Bronx 35948 Brooklyn 100748 Manhattan 155391 Queens 90362 Staten Island 13168 Bronx 35,948 Brooklyn 100,748 Manhattan 155,391 Queens 90,362 Staten Island 13,168
"""
Print the number of restaurants in each borough.
"""
import sys
import pandas as pd
import restaurantinspection
df = restaurantinspection.getDf(fromWeb = False, verbose = False)
#Index is CAMIS number of each restaurant in increasing order, value is restaurant's boro.
seriesOfBoros = df.groupby("CAMIS")["BORO"].first()
print("Number of restaurants in each borough")
print()
#Index is boros in alphabetical order, value is number of restaurants in boro.
seriesOfNumbers = seriesOfBoros.groupby(seriesOfBoros).size()
#seriesOfNumbers.sort_values(inplace = True) #increasing size order
print(seriesOfNumbers)
print()
#Index is boros, value is number of restaurants in boro in decreasing order.
seriesOfNumbers = seriesOfBoros.value_counts()
#seriesOfNumbers.sort_index(inplace = True) #alphabetical order
print(seriesOfNumbers)
sys.exit(0)
Number of restaurants in each borough BORO Bronx 2395 Brooklyn 6775 Manhattan 10761 Queens 6140 Staten Island 973 Name: BORO, dtype: int64 Manhattan 10761 Brooklyn 6775 Queens 6140 Bronx 2395 Staten Island 973 Name: BORO, dtype: int64
"""
Print the number of mouse violations in each month.
"""
import sys
import pandas as pd
import restaurantinspection
#Add three columns to the DataFrame.
df = restaurantinspection.getDf(fromWeb = False, verbose = False)
df["timestamp"] = pd.to_datetime(df["INSPECTION DATE"], format = "%m/%d/%Y")
df["monthname"] = df["timestamp"].apply(lambda timestamp: timestamp.month_name())
df["monthnumber"] = df["timestamp"].apply(lambda timestamp: timestamp.month)
s = "Evidence of mice or live mice present in facility's food and/or non-food areas."
mask = (df["VIOLATION DESCRIPTION"].str != "") & (df["VIOLATION DESCRIPTION"].str.contains(s))
dataFrameOfViolations = df[mask]
print(f"{len(dataFrameOfViolations):,} of the {len(df):,} inspections revealed mouse violations.")
print()
print("Mouse violations per month:")
print()
#Indexed by month number and month name, in order of ascending month number.
seriesOfNumbers = dataFrameOfViolations.groupby(["monthnumber", "monthname"]).size()
print(seriesOfNumbers)
print()
#Drop the month numbers. They were there only to get the month names in chronological order.
seriesOfNumbers.reset_index(level = 0, drop = True, inplace = True)
print(seriesOfNumbers)
sys.exit(0)
29,090 of the 395,617 inspections revealed mouse violations. Mouse violations per month: monthnumber monthname 1 January 2425 2 February 2380 3 March 2980 4 April 2626 5 May 2610 6 June 2453 7 July 2032 8 August 2303 9 September 2425 10 October 2820 11 November 2053 12 December 1983 dtype: int64 monthname January 2425 February 2380 March 2980 April 2626 May 2610 June 2453 July 2032 August 2303 September 2425 October 2820 November 2053 December 1983 dtype: int64
"""
Print the number of restaurants of each cuisine.
series is indexed by cuisines in alphabetical order.
"""
import sys
import pandas as pd
import restaurantinspection
df = restaurantinspection.getDf(fromWeb = False, verbose = False)
series = df.groupby("CUISINE DESCRIPTION").apply(lambda df: len(df["CAMIS"].unique()))
#could also say
#series = df.groupby("CUISINE DESCRIPTION").apply(lambda df: len(df.groupby("CAMIS")))
#series = df.groupby("CUISINE DESCRIPTION").apply(lambda df: len(df["CAMIS"].value_counts()))
print("Number of restaurants:")
print()
s = series.to_string(max_rows = len(series))
print(s)
print()
s = series.sort_values(ascending = False).to_string(max_rows = len(series))
print(s)
sys.exit(0)
Number of restaurants: CUISINE DESCRIPTION Afghan 14 African 74 American 5735 Armenian 25 Asian 402 Australian 21 Bagels/Pretzels 153 Bakery 748 Bangladeshi 43 Barbecue 54 Basque 1 Bottled beverages, including water, sodas, juices, etc. 109 Brazilian 29 Café/Coffee/Tea 1798 Cajun 6 Californian 4 Caribbean 706 Chicken 505 Chilean 1 Chinese 2380 Chinese/Cuban 21 Chinese/Japanese 44 Continental 48 Creole 22 Creole/Cajun 6 Czech 4 Delicatessen 298 Donuts 522 Eastern European 69 Egyptian 12 English 12 Ethiopian 17 Filipino 31 French 310 Fruits/Vegetables 6 German 28 Greek 138 Hamburgers 363 Hawaiian 48 Hotdogs 28 Hotdogs/Pretzels 31 Ice Cream, Gelato, Yogurt, Ices 346 Indian 320 Indonesian 9 Iranian 4 Irish 191 Italian 964 Japanese 868 Jewish/Kosher 335 Juice, Smoothies, Fruit Salads 423 Korean 308 Latin (Cuban, Dominican, Puerto Rican, South & Central American) 828 Mediterranean 281 Mexican 963 Middle Eastern 192 Moroccan 9 Not Listed/Not Applicable 9 Nuts/Confectionary 6 Other 1739 Pakistani 30 Pancakes/Waffles 15 Peruvian 83 Pizza 1199 Pizza/Italian 476 Polish 25 Portuguese 11 Russian 68 Salads 78 Sandwiches 358 Sandwiches/Salads/Mixed Buffet 234 Scandinavian 8 Seafood 195 Soul Food 66 Soups 8 Soups & Sandwiches 43 Southwestern 3 Spanish 630 Steak 79 Tapas 41 Tex-Mex 126 Thai 307 Turkish 71 Vegetarian 130 Vietnamese/Cambodian/Malaysia 99 CUISINE DESCRIPTION American 5735 Chinese 2380 Café/Coffee/Tea 1798 Other 1739 Pizza 1199 Italian 964 Mexican 963 Japanese 868 Latin (Cuban, Dominican, Puerto Rican, South & Central American) 828 Bakery 748 Caribbean 706 Spanish 630 Donuts 522 Chicken 505 Pizza/Italian 476 Juice, Smoothies, Fruit Salads 423 Asian 402 Hamburgers 363 Sandwiches 358 Ice Cream, Gelato, Yogurt, Ices 346 Jewish/Kosher 335 Indian 320 French 310 Korean 308 Thai 307 Delicatessen 298 Mediterranean 281 Sandwiches/Salads/Mixed Buffet 234 Seafood 195 Middle Eastern 192 Irish 191 Bagels/Pretzels 153 Greek 138 Vegetarian 130 Tex-Mex 126 Bottled beverages, including water, sodas, juices, etc. 109 Vietnamese/Cambodian/Malaysia 99 Peruvian 83 Steak 79 Salads 78 African 74 Turkish 71 Eastern European 69 Russian 68 Soul Food 66 Barbecue 54 Continental 48 Hawaiian 48 Chinese/Japanese 44 Soups & Sandwiches 43 Bangladeshi 43 Tapas 41 Filipino 31 Hotdogs/Pretzels 31 Pakistani 30 Brazilian 29 German 28 Hotdogs 28 Polish 25 Armenian 25 Creole 22 Australian 21 Chinese/Cuban 21 Ethiopian 17 Pancakes/Waffles 15 Afghan 14 English 12 Egyptian 12 Portuguese 11 Not Listed/Not Applicable 9 Indonesian 9 Moroccan 9 Soups 8 Scandinavian 8 Creole/Cajun 6 Cajun 6 Fruits/Vegetables 6 Nuts/Confectionary 6 Czech 4 Californian 4 Iranian 4 Southwestern 3 Basque 1 Chilean 1
"""
Print the number of restaurants of each cuisine.
series is indexed by cuisines in alphabetical order.
"""
import sys
import pandas as pd
import restaurantinspection
df = restaurantinspection.getDf(fromWeb = False, verbose = False)
df = (df
.groupby("CAMIS")
.first()
.groupby(["CUISINE DESCRIPTION", "BORO"])["BORO"]
.count()
.unstack()
.fillna(0)
.astype(int)
)
pd.set_option("display.width", None) #prevent column from beoing elided
pd.set_option("display.max_colwidth", 64) #prevent index from being truncated
s = df.to_string(max_rows = len(df), show_dimensions = True)
print(s)
sys.exit(0)
BORO Bronx Brooklyn Manhattan Queens Staten Island CUISINE DESCRIPTION Afghan 0 2 4 8 0 African 26 23 20 4 1 American 404 1253 2882 977 219 Armenian 1 7 13 4 0 Asian 6 95 171 115 15 Australian 0 4 16 1 0 Bagels/Pretzels 7 37 60 39 10 Bakery 75 188 240 223 22 Bangladeshi 8 9 6 20 0 Barbecue 3 16 24 6 5 Basque 0 0 1 0 0 Bottled beverages, including water, sodas, juices, etc. 9 23 38 35 4 Brazilian 0 5 13 11 0 Café/Coffee/Tea 62 455 978 275 28 Cajun 0 4 0 1 1 Californian 0 1 3 0 0 Caribbean 119 339 38 205 5 Chicken 113 156 111 108 17 Chilean 0 0 0 1 0 Chinese 297 726 507 774 76 Chinese/Cuban 3 3 11 4 0 Chinese/Japanese 0 13 19 10 2 Continental 0 10 26 11 1 Creole 0 15 2 5 0 Creole/Cajun 0 4 0 2 0 Czech 1 1 0 2 0 Delicatessen 27 52 112 94 13 Donuts 66 122 143 156 35 Eastern European 3 33 17 13 3 Egyptian 0 4 5 3 0 English 0 1 10 1 0 Ethiopian 0 4 13 0 0 Filipino 0 1 8 20 2 French 2 51 242 15 0 Fruits/Vegetables 0 1 3 1 1 German 0 6 12 7 3 Greek 6 18 56 52 6 Hamburgers 63 86 90 102 22 Hawaiian 1 8 33 6 0 Hotdogs 1 7 18 2 0 Hotdogs/Pretzels 3 3 17 5 3 Ice Cream, Gelato, Yogurt, Ices 23 102 115 74 32 Indian 7 60 138 106 9 Indonesian 0 2 3 4 0 Iranian 0 0 2 2 0 Irish 10 12 115 52 2 Italian 42 169 570 113 70 Japanese 16 187 471 161 33 Jewish/Kosher 8 186 76 63 2 Juice, Smoothies, Fruit Salads 42 108 174 88 11 Korean 0 26 109 173 0 Latin (Cuban, Dominican, Puerto Rican, South & Central American) 175 139 188 315 11 Mediterranean 5 68 149 50 9 Mexican 123 288 326 186 40 Middle Eastern 4 86 58 38 6 Moroccan 0 0 8 1 0 Not Listed/Not Applicable 1 1 4 2 1 Nuts/Confectionary 0 4 2 0 0 Other 104 449 760 371 55 Pakistani 1 13 7 9 0 Pancakes/Waffles 2 3 3 5 2 Peruvian 3 13 15 49 3 Pizza 203 307 355 277 57 Pizza/Italian 59 130 127 105 55 Polish 0 17 2 3 3 Portuguese 0 2 4 5 0 Russian 0 51 8 6 3 Salads 0 8 65 5 0 Sandwiches 45 63 155 72 23 Sandwiches/Salads/Mixed Buffet 25 32 113 57 7 Scandinavian 0 3 5 0 0 Seafood 26 43 93 29 4 Soul Food 10 26 14 14 2 Soups 0 1 7 0 0 Soups & Sandwiches 2 7 30 3 1 Southwestern 0 0 1 2 0 Spanish 131 143 124 211 21 Steak 5 8 54 11 1 Tapas 0 10 22 8 1 Tex-Mex 10 32 50 28 6 Thai 5 80 139 80 3 Turkish 0 30 28 10 3 Vegetarian 1 47 69 13 0 Vietnamese/Cambodian/Malaysia 1 33 41 21 3 [84 rows x 5 columns]
"""
For each cuisine, print the fraction of inspections that report mice.
"""
import sys
import pandas as pd
import restaurantinspection
df = restaurantinspection.getDf(fromWeb = False, verbose = False)
s = "Evidence of mice or live mice present in facility's food and/or non-food areas."
#Return the fraction of inspections that report mice.
def f(df):
seriesOfBools = df["VIOLATION DESCRIPTION"].str.contains(s).astype(bool)
return sum(seriesOfBools) / len(df)
series = df.groupby("CUISINE DESCRIPTION").apply(f) #alphabetical order
print(series.to_string(
dtype = True,
length = True,
max_rows = len(series),
name = True
))
print()
series.sort_values(inplace = True) #increasing numeric order
print(series.to_string(
dtype = True,
length = True,
max_rows = len(series),
name = True
))
sys.exit(0)
CUISINE DESCRIPTION Afghan 0.120773 African 0.118414 American 0.085564 Armenian 0.070000 Asian 0.086534 Australian 0.097744 Bagels/Pretzels 0.098411 Bakery 0.106604 Bangladeshi 0.090909 Barbecue 0.067183 Basque 0.000000 Bottled beverages, including water, sodas, juices, etc. 0.145559 Brazilian 0.066955 Café/Coffee/Tea 0.081409 Cajun 0.058140 Californian 0.055556 Caribbean 0.128698 Chicken 0.095354 Chilean 0.172414 Chinese 0.104725 Chinese/Cuban 0.088993 Chinese/Japanese 0.081049 Continental 0.057725 Creole 0.136531 Creole/Cajun 0.107143 Czech 0.146341 Delicatessen 0.089939 Donuts 0.054654 Eastern European 0.122524 Egyptian 0.136126 English 0.075117 Ethiopian 0.105485 Filipino 0.128319 French 0.091407 Fruits/Vegetables 0.038462 German 0.085586 Greek 0.079593 Hamburgers 0.060984 Hawaiian 0.084746 Hotdogs 0.109290 Hotdogs/Pretzels 0.121951 Ice Cream, Gelato, Yogurt, Ices 0.122258 Indian 0.094237 Indonesian 0.089286 Iranian 0.193548 Irish 0.102481 Italian 0.089796 Japanese 0.095803 Jewish/Kosher 0.100401 Juice, Smoothies, Fruit Salads 0.079560 Korean 0.074265 Latin (Cuban, Dominican, Puerto Rican, South & Central American) 0.090856 Mediterranean 0.097950 Mexican 0.083830 Middle Eastern 0.109768 Moroccan 0.077844 Not Listed/Not Applicable 0.028571 Nuts/Confectionary 0.088235 Other 0.603555 Pakistani 0.121951 Pancakes/Waffles 0.116592 Peruvian 0.100948 Pizza 0.105466 Pizza/Italian 0.100672 Polish 0.137615 Portuguese 0.101796 Russian 0.106420 Salads 0.049844 Sandwiches 0.083353 Sandwiches/Salads/Mixed Buffet 0.076412 Scandinavian 0.142857 Seafood 0.079058 Soul Food 0.112989 Soups 0.125000 Soups & Sandwiches 0.066547 Southwestern 0.025316 Spanish 0.090044 Steak 0.084507 Tapas 0.132039 Tex-Mex 0.105292 Thai 0.091229 Turkish 0.090105 Vegetarian 0.105071 Vietnamese/Cambodian/Malaysia 0.095848 Length: 84, dtype: float64 CUISINE DESCRIPTION Basque 0.000000 Southwestern 0.025316 Not Listed/Not Applicable 0.028571 Fruits/Vegetables 0.038462 Salads 0.049844 Donuts 0.054654 Californian 0.055556 Continental 0.057725 Cajun 0.058140 Hamburgers 0.060984 Soups & Sandwiches 0.066547 Brazilian 0.066955 Barbecue 0.067183 Armenian 0.070000 Korean 0.074265 English 0.075117 Sandwiches/Salads/Mixed Buffet 0.076412 Moroccan 0.077844 Seafood 0.079058 Juice, Smoothies, Fruit Salads 0.079560 Greek 0.079593 Chinese/Japanese 0.081049 Café/Coffee/Tea 0.081409 Sandwiches 0.083353 Mexican 0.083830 Steak 0.084507 Hawaiian 0.084746 American 0.085564 German 0.085586 Asian 0.086534 Nuts/Confectionary 0.088235 Chinese/Cuban 0.088993 Indonesian 0.089286 Italian 0.089796 Delicatessen 0.089939 Spanish 0.090044 Turkish 0.090105 Latin (Cuban, Dominican, Puerto Rican, South & Central American) 0.090856 Bangladeshi 0.090909 Thai 0.091229 French 0.091407 Indian 0.094237 Chicken 0.095354 Japanese 0.095803 Vietnamese/Cambodian/Malaysia 0.095848 Australian 0.097744 Mediterranean 0.097950 Bagels/Pretzels 0.098411 Jewish/Kosher 0.100401 Pizza/Italian 0.100672 Peruvian 0.100948 Portuguese 0.101796 Irish 0.102481 Chinese 0.104725 Vegetarian 0.105071 Tex-Mex 0.105292 Pizza 0.105466 Ethiopian 0.105485 Russian 0.106420 Bakery 0.106604 Creole/Cajun 0.107143 Hotdogs 0.109290 Middle Eastern 0.109768 Soul Food 0.112989 Pancakes/Waffles 0.116592 African 0.118414 Afghan 0.120773 Pakistani 0.121951 Hotdogs/Pretzels 0.121951 Ice Cream, Gelato, Yogurt, Ices 0.122258 Eastern European 0.122524 Soups 0.125000 Filipino 0.128319 Caribbean 0.128698 Tapas 0.132039 Egyptian 0.136126 Creole 0.136531 Polish 0.137615 Scandinavian 0.142857 Bottled beverages, including water, sodas, juices, etc. 0.145559 Czech 0.146341 Chilean 0.172414 Iranian 0.193548 Other 0.603555 Length: 84, dtype: float64
"""
For each cuisine, print the fraction of restaurants that have had mice
at one time or another.
"""
import sys
import pandas as pd
import restaurantinspection
df = restaurantinspection.getDf(fromWeb = False, verbose = False)
s = "Evidence of mice or live mice present in facility's food and/or non-food areas."
#Called once for each CAMIS.
#df is a DataFrame of all the inspections belonging to the CAMIS.
#Return True if any inspection revealed mice.
def hadMice(df):
return df["VIOLATION DESCRIPTION"].str.contains(s).any()
#Called once for each cusine.
#df is a DataFrame of all the inspections belonging to the cusine.
def get3Numbers(df):
seriesOfBools = df.groupby("CAMIS").apply(hadMice)
nr = len(seriesOfBools) #number of restaurants in this cuisine
nm = seriesOfBools.sum() #number of restaurants in this cuisine with mice
return pd.Series([nr, nm, nm / nr])
dataFrameOfNumbers = df.groupby("CUISINE DESCRIPTION").apply(get3Numbers) #alphabetical order
dataFrameOfNumbers.columns = [
"n", #number of restaurants in this cuisine
"mice", #number of restaurants in this cuisine with mice
"fraction"
]
dataFrameOfNumbers["n"] = dataFrameOfNumbers["n"].astype(int)
dataFrameOfNumbers["mice"] = dataFrameOfNumbers["mice"].astype(int)
print("Fraction of restaurants that have had mice at one time or another:")
print()
pd.options.display.max_colwidth = 64
st = dataFrameOfNumbers.to_string(max_rows = len(dataFrameOfNumbers))
print(st)
print()
dataFrameOfNumbers.sort_values(by = "fraction", inplace = True)
st = dataFrameOfNumbers.to_string(max_rows = len(dataFrameOfNumbers))
print(st)
sys.exit(0)
Fraction of restaurants that have had mice at one time or another:
n mice fraction
CUISINE DESCRIPTION
Afghan 14 10 0.714286
African 74 55 0.743243
American 5735 2395 0.417611
Armenian 25 7 0.280000
Asian 402 185 0.460199
Australian 21 13 0.619048
Bagels/Pretzels 153 84 0.549020
Bakery 748 392 0.524064
Bangladeshi 43 27 0.627907
Barbecue 54 15 0.277778
Basque 1 0 0.000000
Bottled beverages, including water, sodas, juices, etc. 109 45 0.412844
Brazilian 29 13 0.448276
Café/Coffee/Tea 1798 552 0.307008
Cajun 6 4 0.666667
Californian 4 2 0.500000
Caribbean 706 473 0.669972
Chicken 505 236 0.467327
Chilean 1 1 1.000000
Chinese 2380 1356 0.569748
Chinese/Cuban 21 13 0.619048
Chinese/Japanese 44 23 0.522727
Continental 48 13 0.270833
Creole 22 15 0.681818
Creole/Cajun 6 4 0.666667
Czech 4 4 1.000000
Delicatessen 298 173 0.580537
Donuts 522 129 0.247126
Eastern European 69 47 0.681159
Egyptian 12 5 0.416667
English 12 5 0.416667
Ethiopian 17 9 0.529412
Filipino 31 22 0.709677
French 310 142 0.458065
Fruits/Vegetables 6 0 0.000000
German 28 14 0.500000
Greek 138 67 0.485507
Hamburgers 363 109 0.300275
Hawaiian 48 12 0.250000
Hotdogs 28 6 0.214286
Hotdogs/Pretzels 31 8 0.258065
Ice Cream, Gelato, Yogurt, Ices 346 128 0.369942
Indian 320 204 0.637500
Indonesian 9 7 0.777778
Iranian 4 3 0.750000
Irish 191 114 0.596859
Italian 964 490 0.508299
Japanese 868 417 0.480415
Jewish/Kosher 335 189 0.564179
Juice, Smoothies, Fruit Salads 423 134 0.316785
Korean 308 129 0.418831
Latin (Cuban, Dominican, Puerto Rican, South & Central American) 828 476 0.574879
Mediterranean 281 131 0.466192
Mexican 963 463 0.480789
Middle Eastern 192 101 0.526042
Moroccan 9 4 0.444444
Not Listed/Not Applicable 9 2 0.222222
Nuts/Confectionary 6 1 0.166667
Other 1739 43 0.024727
Pakistani 30 18 0.600000
Pancakes/Waffles 15 10 0.666667
Peruvian 83 54 0.650602
Pizza 1199 632 0.527106
Pizza/Italian 476 269 0.565126
Polish 25 15 0.600000
Portuguese 11 6 0.545455
Russian 68 37 0.544118
Salads 78 22 0.282051
Sandwiches 358 120 0.335196
Sandwiches/Salads/Mixed Buffet 234 77 0.329060
Scandinavian 8 5 0.625000
Seafood 195 85 0.435897
Soul Food 66 40 0.606061
Soups 8 2 0.250000
Soups & Sandwiches 43 12 0.279070
Southwestern 3 1 0.333333
Spanish 630 349 0.553968
Steak 79 39 0.493671
Tapas 41 18 0.439024
Tex-Mex 126 57 0.452381
Thai 307 167 0.543974
Turkish 71 35 0.492958
Vegetarian 130 70 0.538462
Vietnamese/Cambodian/Malaysia 99 55 0.555556
n mice fraction
CUISINE DESCRIPTION
Basque 1 0 0.000000
Fruits/Vegetables 6 0 0.000000
Other 1739 43 0.024727
Nuts/Confectionary 6 1 0.166667
Hotdogs 28 6 0.214286
Not Listed/Not Applicable 9 2 0.222222
Donuts 522 129 0.247126
Hawaiian 48 12 0.250000
Soups 8 2 0.250000
Hotdogs/Pretzels 31 8 0.258065
Continental 48 13 0.270833
Barbecue 54 15 0.277778
Soups & Sandwiches 43 12 0.279070
Armenian 25 7 0.280000
Salads 78 22 0.282051
Hamburgers 363 109 0.300275
Café/Coffee/Tea 1798 552 0.307008
Juice, Smoothies, Fruit Salads 423 134 0.316785
Sandwiches/Salads/Mixed Buffet 234 77 0.329060
Southwestern 3 1 0.333333
Sandwiches 358 120 0.335196
Ice Cream, Gelato, Yogurt, Ices 346 128 0.369942
Bottled beverages, including water, sodas, juices, etc. 109 45 0.412844
Egyptian 12 5 0.416667
English 12 5 0.416667
American 5735 2395 0.417611
Korean 308 129 0.418831
Seafood 195 85 0.435897
Tapas 41 18 0.439024
Moroccan 9 4 0.444444
Brazilian 29 13 0.448276
Tex-Mex 126 57 0.452381
French 310 142 0.458065
Asian 402 185 0.460199
Mediterranean 281 131 0.466192
Chicken 505 236 0.467327
Japanese 868 417 0.480415
Mexican 963 463 0.480789
Greek 138 67 0.485507
Turkish 71 35 0.492958
Steak 79 39 0.493671
German 28 14 0.500000
Californian 4 2 0.500000
Italian 964 490 0.508299
Chinese/Japanese 44 23 0.522727
Bakery 748 392 0.524064
Middle Eastern 192 101 0.526042
Pizza 1199 632 0.527106
Ethiopian 17 9 0.529412
Vegetarian 130 70 0.538462
Thai 307 167 0.543974
Russian 68 37 0.544118
Portuguese 11 6 0.545455
Bagels/Pretzels 153 84 0.549020
Spanish 630 349 0.553968
Vietnamese/Cambodian/Malaysia 99 55 0.555556
Jewish/Kosher 335 189 0.564179
Pizza/Italian 476 269 0.565126
Chinese 2380 1356 0.569748
Latin (Cuban, Dominican, Puerto Rican, South & Central American) 828 476 0.574879
Delicatessen 298 173 0.580537
Irish 191 114 0.596859
Polish 25 15 0.600000
Pakistani 30 18 0.600000
Soul Food 66 40 0.606061
Australian 21 13 0.619048
Chinese/Cuban 21 13 0.619048
Scandinavian 8 5 0.625000
Bangladeshi 43 27 0.627907
Indian 320 204 0.637500
Peruvian 83 54 0.650602
Cajun 6 4 0.666667
Pancakes/Waffles 15 10 0.666667
Creole/Cajun 6 4 0.666667
Caribbean 706 473 0.669972
Eastern European 69 47 0.681159
Creole 22 15 0.681818
Filipino 31 22 0.709677
Afghan 14 10 0.714286
African 74 55 0.743243
Iranian 4 3 0.750000
Indonesian 9 7 0.777778
Chilean 1 1 1.000000
Czech 4 4 1.000000