SQL is the Structured Query Language for reading and writing database files.
The Python version of SQL is
SQLite,
convenient to use
because you don’t have to set up a separate server somewhere else.
Before we use SQLite in Python,
however,
we will play with it in by running the program
sqlite3
in the macOS Terminal window,
the Linux shell window,
and the Microsoft Command Prompt window.
You probably already have
sqlite3
on your Mac.
which sqlite3 /usr/bin/sqlite3 sqlite3 -help sqlite3 -version 3.16.0 2016-11-04 19:09:39 0e5ffd9123d6d2d2b8f3701e8a73cc98a3a7ff5f
To get the latest version of SQLite, go
here
and download the
.zip
file for
“Precompiled Binaries for Mac OS X (x86)”.
Double-click on the downloaded
.zip
file to create a new folder with the same name as the
.zip
file but without the
.zip
suffix.
This folder contains the three programs
sqldiff,
sqlite3,
and
sqlite_analyzer.
Move
these three programs to your
/usr/local/bin
directory.
cd ~/Downloads/sqlite-tools-osx-x86-3200000 pwd /Users/myname/Downloads/sqlite-tools-osx-x86-3200000 mv sql* /usr/local/bin cd /usr/local/bin pwd /usr/local/bin ls -l sql* -rwxr-xr-x@ 1 myname mygroup 612864 Aug 1 21:02 sqldiff -rwxr-xr-x@ 1 myname mygroup 974212 Aug 1 21:06 sqlite3 -rwxr-xr-x@ 1 myname mygroup 660080 Aug 1 21:04 sqlite3_analyzer
See if you now have the current version of
sqlite3:
which sqlite3 /usr/local/bin/sqlite3 sqlite3 -version 3.20.0 2017-08-01 13:24:15 9501e22dfeebdcefa783575e47c60b514d7c2e0cad73b2a496c0bc4b680900a8 sqlite3 -help Usage: /usr/local/bin/sqlite3 [OPTIONS] FILENAME [SQL] FILENAME is the name of an SQLite database. A new database is created if the file does not previously exist. OPTIONS include: -ascii set output mode to 'ascii' -bail stop after hitting an error -batch force batch I/O -column set output mode to 'column' -cmd COMMAND run "COMMAND" before reading stdin -csv set output mode to 'csv' -echo print commands before execution -init FILENAME read/process named file -[no]header turn headers on or off -help show this message -html set output mode to HTML -interactive force interactive I/O -line set output mode to 'line' -list set output mode to 'list' -lookaside SIZE N use N entries of SZ bytes for lookaside memory -mmap N default mmap size set to N -newline SEP set output row separator. Default: '\n' -nullvalue TEXT set text string for NULL values. Default '' -pagecache SIZE N use N slots of SZ bytes each for page cache memory -quote set output mode to 'quote' -scratch SIZE N use N slots of SZ bytes each for scratch memory -separator SEP set output column separator. Default: '|' -stats print memory stats before each finalize -version show SQLite version -vfs NAME use NAME as the default VFS
Let’s create a file named
zipcodes.db
in your
/Users/myname/python
directory.
cd ~/python
pwd
/Users/myname/python
sqlite3 zipcodes.db
SQLite version 3.20.0 2017-08-01 13:24:15
Enter ".help" for usage hints.
sqlite> .help
.auth ON|OFF Show authorizer callbacks
.backup ?DB? FILE Backup DB (default "main") to FILE
.bail on|off Stop after hitting an error. Default OFF
.binary on|off Turn binary output on or off. Default OFF
.cd DIRECTORY Change the working directory to DIRECTORY
.changes on|off Show number of rows changed by SQL
.check GLOB Fail if output since .testcase does not match
.clone NEWDB Clone data into NEWDB from the existing database
.databases List names and files of attached databases
.dbinfo ?DB? Show status information about the database
.dump ?TABLE? ... Dump the database in an SQL text format
If TABLE specified, only dump tables matching
LIKE pattern TABLE.
.echo on|off Turn command echo on or off
.eqp on|off|full Enable or disable automatic EXPLAIN QUERY PLAN
.exit Exit this program
.fullschema ?--indent? Show schema and the content of sqlite_stat tables
.headers on|off Turn display of headers on or off
.help Show this message
.import FILE TABLE Import data from FILE into TABLE
.imposter INDEX TABLE Create imposter table TABLE on index INDEX
.indexes ?TABLE? Show names of all indexes
If TABLE specified, only show indexes for tables
matching LIKE pattern TABLE.
.limit ?LIMIT? ?VAL? Display or change the value of an SQLITE_LIMIT
.lint OPTIONS Report potential schema issues. Options:
fkey-indexes Find missing foreign key indexes
.load FILE ?ENTRY? Load an extension library
.log FILE|off Turn logging on or off. FILE can be stderr/stdout
.mode MODE ?TABLE? Set output mode where MODE is one of:
ascii Columns/rows delimited by 0x1F and 0x1E
csv Comma-separated values
column Left-aligned columns. (See .width)
html HTML <table> code
insert SQL insert statements for TABLE
line One value per line
list Values delimited by "|"
quote Escape answers as for SQL
tabs Tab-separated values
tcl TCL list elements
.nullvalue STRING Use STRING in place of NULL values
.once FILENAME Output for the next SQL command only to FILENAME
.open ?OPTIONS? ?FILE? Close existing database and reopen FILE
The --new option starts with an empty file
.output ?FILENAME? Send output to FILENAME or stdout
.print STRING... Print literal STRING
.prompt MAIN CONTINUE Replace the standard prompts
.quit Exit this program
.read FILENAME Execute SQL in FILENAME
.restore ?DB? FILE Restore content of DB (default "main") from FILE
.save FILE Write in-memory database into FILE
.scanstats on|off Turn sqlite3_stmt_scanstatus() metrics on or off
.schema ?PATTERN? Show the CREATE statements matching PATTERN
Add --indent for pretty-printing
.selftest ?--init? Run tests defined in the SELFTEST table
.separator COL ?ROW? Change the column separator and optionally the row
separator for both the output mode and .import
.sha3sum ?OPTIONS...? Compute a SHA3 hash of database content
.shell CMD ARGS... Run CMD ARGS... in a system shell
.show Show the current values for various settings
.stats ?on|off? Show stats or turn stats on or off
.system CMD ARGS... Run CMD ARGS... in a system shell
.tables ?TABLE? List names of tables
If TABLE specified, only list tables matching
LIKE pattern TABLE.
.testcase NAME Begin redirecting output to 'testcase-out.txt'
.timeout MS Try opening locked tables for MS milliseconds
.timer on|off Turn SQL timer on or off
.trace FILE|off Output each SQL statement as it is run
.vfsinfo ?AUX? Information about the top-level VFS
.vfslist List all available VFSes
.vfsname ?AUX? Print the name of the VFS stack
.width NUM1 NUM2 ... Set column widths for "column" mode
Negative values right-justify
sqlite> .databases
main: /Users/myname/python/zipcodes.db
A
table
is a rectangle made of rows and columns.
The columns are things like lastname, firstname, social security number.
The
datatype
of each column could be
text,
integer (a number without a decimal point or fraction),
real (a number with a decimal point and fraction),
etc.
text
might be called
varchar
in other versions of SQL.
The rows are the records (Moe, Larry, Curly, and Shemp).
The name of our table will be
zipcodeTable.
Each of the commands
drop
table
and
create
table
is two separate words with a space between them.
Don’t forget the semicolon at the end of every SQL statement.
(The statements that began with a dot are not SQL statements,
and do not need a semicolon.
They are the “dot commands” listed under
Special
commands to sqlite3 (dot-commands).)
The
sqlite3
continuation prompt is
…>.
Try control-p to go back to the previous line.
sqlite> drop table if exists zipcodeTable;
sqlite> .tables
sqlite> create table zipcodeTable (
zipcode integer primary key,
city text,
state text,
latitude real,
longitude real,
county text
);
sqlite> .tables
zipcodeTable
sqlite> .schema zipcodeTable
CREATE TABLE zipcodeTable (
zipcode integer primary key,
city text,
state text,
latitude real,
longitude real,
county text
);
Our building 11 West 42nd Street is in zipcode 10036.
select
means “print”.
By default,
the
select
statement selects all the
lines
in the specified table.
The asterisk in
select *
means
“all the
fields
on the selected lines”.
sqlite> select count(*) from zipcodeTable;
0
sqlite> insert into zipcodeTable (zipcode, city, state, latitude, longitude, county)
values (10036, "New York", "NY", 40.759724, -73.991826, "New York");
sqlite> select count(*) from zipcodeTable;
1
sqlite> select * from zipcodeTable;
10036|New York|NY|40.759724|-73.991826|New York
sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE zipcodeTable (
zipcode integer primary key,
city text,
state text,
latitude real,
longitude real,
county text
);
INSERT INTO zipcodeTable VALUES(10036,'New York','NY',40.75972399999999851,-73.991826000000003203,'New York');
COMMIT;
And quit
sqlite3,
and examine the
zipcodes.db
file.
sqlite> delete from zipcodeTable where zipcode == 10036; sqlite> select count(*) from zipcodeTable; 0 sqlite> select * from zipcodeTable; sqlite> .quit file zipcodes.db zipcodes.db: SQLite 3.x database od -c zipcodes.db | head -5 0000000 S Q L i t e f o r m a t 3 \0 0000020 020 \0 001 001 \0 @ \0 \0 \0 \a \0 \0 001 377 0000040 \0 \0 \0 003 \0 \0 001 375 \0 \0 \0 001 \0 \0 \0 004 0000060 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 001 \0 \0 \0 \0 0000100 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
Save the file
zip5.csv
into your
/Users/myname/python
directory.
Then type the following commands.
The last line in the CSV file does not end with a newline character,
so
wc -l
does not count that line.
That’s why
wc -l
theinks the file contains only 41,754 lines,
when it really contains 41,755.
cd ~/python pwd /Users/myname/python ls -l zip5.csv -rw-r--r--@ 1 myname mygroup 1975105 Aug 9 22:46 zip5.csv wc -l zip5.csv 41754 zip5.csv head -5 zip5.csv 00501,Holtsville,NY,40.8152,-73.0455,Suffolk 00544,Holtsville,NY,40.8152,-73.0455,Suffolk 00601,Adjuntas,PR,18.1788,-66.7516,Adjuntas 00602,Aguada,PR,18.381389,-67.188611,Aguada 00603,Aguadilla,PR,18.4554,-67.1308,Aguadilla sqlite3 zipcodes.db sqlite> .mode csv sqlite> .show sqlite> .import zip5.csv zipcodeTable sqlite> select count(*) from zipcodeTable; 41755
New England zipcodes start with zero.
The
f
in the name of the SQLite
printf
function stands for “formatted”.
The
"%05d"
format is like the Python
"{:05}".format().
They mean
“print the integer with a minimum of 5 digits.
If there are less than 5 digits, pad out the integer with leading zeros.”
The
d
in
"%05d"
stands for “decimal integer”,
as opposed to hexadecimal or octal integer.
The
as zipcode
specifies the column header above the output of
printf.
See what happens if you
leave out the
as zipcode.
sqlite> .headers on
sqlite> .mode column
sqlite> .show
sqlite> select * from zipcodeTable where 10036 <= zipcode and zipcode <= 10040;
zipcode city state latitude longitude county
---------- ---------- ---------- ---------- ---------- ----------
10036 New York NY 40.759724 -73.991826 New York
10037 New York NY 40.813491 -73.9381 New York
10038 New York NY 40.710092 -74.001298 New York
10039 New York NY 40.826458 -73.938266 New York
10040 New York NY 40.858308 -73.929601 New York
sqlite> select zipcode, city, county, state, zipcode from zipcodeTable
where 10036 <= zipcode and zipcode <= 10040;
zipcode city county state zipcode
---------- ---------- ---------- ---------- ----------
10036 New York New York NY 10036
10037 New York New York NY 10037
10038 New York New York NY 10038
10039 New York New York NY 10039
10040 New York New York NY 10040
sqlite> select * from zipcodeTable where city == "Bennington" and state == "VT";
zipcode city state latitude longitude county
---------- ---------- ---------- ---------- ---------- ----------
5201 Bennington VT 42.882658 -73.192321 Bennington
sqlite> select printf("%05d", zipcode) as zipcode, city, state, latitude, longitude, county
from zipcodeTable where city == "Bennington" and state == "VT";
zipcode city state latitude longitude county
---------- ---------- ---------- ---------- ---------- ----------
05201 Bennington VT 42.882658 -73.192321 Bennington
We can change the second call to
substr
to
substr(city, -3, 3),
which means “the last three characters of the city”.
sqlite> select * from zipcodeTable
where length(city) == 7 and substr(city, 1, 3) == substr(city, 5, 3);
zipcode city state latitude longitude county
---------- ---------- ---------- ---------- ---------- ----------
25434 Paw Paw WV 39.492297 -78.458573 Morgan
49079 Paw Paw MI 42.234931 -85.900488 Van Buren
61353 Paw Paw IL 41.685228 -88.967377 Lee
sqlite> select * from zipcodeTable where city like "James%" and state == "NY"; zipcode city state latitude longitude county ---------- ---------- ---------- ---------- ---------- ---------- 11947 Jamesport NY 40.949444 -72.581944 Suffolk 13078 Jamesville NY 42.982973 -76.076571 Onondaga 14701 Jamestown NY 42.092845 -79.243989 Chautauqua 14702 Jamestown NY 42.096944 -79.235556 Chautauqua sqlite> select * from zipcodeTable where city like "%ning" and state == "NY"; zipcode city state latitude longitude county ---------- ---------- ---------- ---------- ---------- ---------- 10562 Ossining NY 41.167344 -73.853791 Westcheste 14830 Corning NY 42.138331 -77.047546 Steuben 14831 Corning NY 42.142778 -77.055 Steuben sqlite> select * from zipcodeTable where city like "Y%k" and state == "NY"; zipcode city state latitude longitude county ---------- ---------- ---------- ---------- ---------- ---------- 11980 Yaphank NY 40.837037 -72.917435 Suffolk 14592 York NY 42.871111 -77.885556 Livingston
Fast and sloppy, no error checking or
sys.exit.
I didn’t even
close
the input file.
""" widestcity.py How many characters wide is the widest city? """ import csv filename = "zip5.csv" infile = open(filename, encoding = "utf-8", newline = "") lines = csv.reader(infile) listOfIntegers = [len(line[1]) for line in lines] #line[1] is the name of the city print(max(listOfIntegers))
27
sqlite> select * from zipcodeTable where city like "%Saint Johnsbury%"; zipcode city state latitude longitude county ---------- --------------- ---------- ---------- ---------- ---------- 5819 Saint Johnsbury VT 44.427195 -72.005062 Caledonia 5838 East Saint John VT 44.438611 -71.946111 Caledonia 5863 Saint Johnsbury VT 44.456667 -72.016111 Caledonia sqlite> select max(length(city)) from zipcodeTable; max(len ------- 27 sqlite> .width 7, 27, 5, 9, 10, 22 sqlite> .show sqlite> select * from zipcodeTable where city like "%Saint Johnsbury%"; zipcode city state latitude longitude county ------- --------------------------- ----- --------- ---------- ---------------------- 5819 Saint Johnsbury VT 44.427195 -72.005062 Caledonia 5838 East Saint Johnsbury VT 44.438611 -71.946111 Caledonia 5863 Saint Johnsbury Center VT 44.456667 -72.016111 Caledonia sqlite> --What has (at least) four eyes and can't see? sqlite> select * from zipcodeTable where city like "%i%i%i%i%"; zipcode city state latitude longitude county ------- --------------------------- ----- --------- ---------- ---------------------- 18341 Minisink Hills PA 40.998333 -75.1375 Monroe 39762 Mississippi State MS 33.459444 -88.790833 Oktibbeha 59755 Virginia City MT 45.247216 -112.00261 Madison 61259 Illinois City IL 41.389236 -90.892507 Rock Island 74549 Kiamichi Christian Mission OK 34.592604 -94.991557 Le Flore 89440 Virginia City NV 39.387282 -119.59606 Storey sqlite> select * from zipcodeTable where city like "% % % % %"; zipcode city state latitude longitude county ------- --------------------------- ----- --------- ---------- ---------------------- 23085 King And Queen Court House VA 37.717644 -76.863059 King And Queen 47876 Saint Mary Of The Woods IN 39.510833 -87.467222 Vigo 72099 Little Rock Air Force Base AR 34.898611 -92.144167 Pulaski 80840 U S A F Academy CO 38.990448 -104.86013 El Paso 80841 U S A F Academy CO 38.996667 -104.87583 El Paso 83648 Mountain Home A F B ID 43.049315 -115.87360 Elmore 96863 M C B H Kaneohe Bay HI 21.4492 -157.766 Honolulu
The wildcard
_
(underscore)
looks for exactly one character,
no more and no less.
sqlite> select * from zipcodeTable where city like "%gr_ys%"; zipcode city state latitude longitude county ------- --------------------------- ----- --------- ---------- ---------------------- 7950 Greystone Park NJ 40.843982 -74.479645 Morris 15337 Graysville PA 39.909165 -80.395223 Greene etc.
sqlite> select * from zipcodeTable where city like "fort %" or city like "ft %";
zipcode city state latitude longitude county
------- --------------------------- ----- --------- ---------- ----------------------
7024 Fort Lee NJ 40.850312 -73.974455 Bergen
22211 Ft Myer VA 38.8795 -77.080258 Arlington
etc.
sqlite> select * from zipCodeTable
where not city like "%a%"
and not city like "%e%"
and not city like "%i%"
and not city like "%o%"
and not city like "%u%";
zipcode city state latitude longitude county
------- --------------------------- ----- --------- ---------- ----------------------
1901 Lynn MA 42.463378 -70.945516 Essex
23066 Gwynn VA 37.500594 -76.29042 Mathews
31795 Ty Ty GA 31.45595 -83.621989 Tift
97622 Bly OR 42.397778 -121.04055 Klamath
etc.
A degree of latitude is equal to approximately 57.2957795 miles anywhere on the globe, so a mile is equal to 1/57.2957795 degrees and a half mile is a equal to .5/57.2957795 degrees.
A degree of longitude is equal to approximately .75 * 57.2957795 miles in New York City, because the cosine of our latitude is approximately .75. The longitude lines are closer together than the latitude lines here.
Let’s find all the zip codes in the square mile centered on 10036.
sqlite> select * from zipcodeTable where zipcode == 10036;
zipcode city state latitude longitude county
------- --------------------------- ----- --------- ---------- ----------------------
10036 New York NY 40.759724 -73.991826 New York
sqlite> select * from zipcodeTable
where abs(latitude - 40.759724) < .5 / 57.2957795
and abs(longitude - -73.991826) < .5 / (.75 * 57.2957795)
order by zipcode;
zipcode city state latitude longitude county
------- --------------------------- ----- --------- ---------- ----------------------
10018 New York NY 40.754713 -73.992503 New York
10019 New York NY 40.765069 -73.985834 New York
10020 New York NY 40.759729 -73.982347 New York
10036 New York NY 40.759724 -73.991826 New York
10069 New York NY 40.7543 -73.9997 New York
10101 New York NY 40.7632 -73.9862 New York
10102 New York NY 40.7632 -73.9862 New York
10105 New York NY 40.7632 -73.9862 New York
10106 New York NY 40.7647 -73.9804 New York
10107 New York NY 40.7661 -73.9825 New York
10108 New York NY 40.7574 -73.9918 New York
10109 New York NY 40.7574 -73.9918 New York
10110 New York NY 40.7533 -73.9808 New York
10129 New York NY 40.7574 -73.9918 New York
10138 New York NY 40.754 -73.9909 New York
10149 New York NY 40.7655 -73.9873 New York
10156 New York NY 40.753 -73.9924 New York
10157 New York NY 40.753 -73.9924 New York
Staten Island secedes from New York City!
sqlite> select * from zipcodeTable
where city == "Staten Island" and county == "Richmond" and state == "NY"
limit 5;
zipcode city state latitude longitude county
---------- ------------- ---------- ---------- ---------- ----------
10301 Staten Island NY 40.631602 -74.092663 Richmond
10302 Staten Island NY 40.630597 -74.137918 Richmond
10303 Staten Island NY 40.630062 -74.160679 Richmond
10304 Staten Island NY 40.610249 -74.087836 Richmond
10305 Staten Island NY 40.597296 -74.076795 Richmond
sqlite> update zipcodeTable set state = "SI"
where city == "Staten Island" and county == "Richmond" and state == "NY";
sqlite> delete from zipcodeTable where zipcode == 10036; --Delete one row. sqlite> delete from zipcodeTable; --Delete every row.
Suppose a
sinister
force
changed the above 5 characters
10036
to the 31 characters
10036; delete from zipcodeTable
Then the computer would execute the following two statements,
deleting every row in the table.
This is called an
injection attack.
sqlite> delete from zipcodeTable where zipcode == 10036; delete from zipcodeTable;
Download the file
states.csv
to your
/Users/myname/python
directory.
Then
sqlite> create table stateTable (
abbreviation text primary key,
name text
);
sqlite> .mode csv
sqlite> .import states.csv stateTable
sqlite> .mode column
sqlite> .headers on
sqlite> select * from stateTable limit 5;
abbreviation name
------------ ----------
AL Alabama
AK Alaska
AZ Arizona
AR Arkansas
CA California
zipcodeTable.zipcode
means the
zipcode
column in the
zipcodeTable.
We can now print out the full name of the state of each zip code
(New York vs. NY)
without having to store the full names in the
zipcodeTable.
That’s because
the
state
column in the
zipcodeTable
contains the same two-letter codes as the
abbreviation
column in the
stateTable.
sqlite> select zipcodeTable.zipcode, zipcodeTable.city, zipcodeTable.county, stateTable.name
from zipcodeTable, stateTable
where zipcodeTable.city == "Yonkers" and zipcodeTable.state == stateTable.abbreviation
order by zipcodeTable.zipcode;
zipcode city county name
---------- ---------- ----------- ----------
10701 Yonkers Westchester New York
10702 Yonkers Westchester New York
10703 Yonkers Westchester New York
10704 Yonkers Westchester New York
10705 Yonkers Westchester New York
10710 Yonkers Westchester New York
sqlite> .quit