Note 6: Files & Command Line

Read: json

Files

Reading from and writing to text files allows us to easily process large amounts of data with Python.

Download metamorphosis.txt. It is an excerpt from Franz Kafka’s Metamorphosis.

You can open the file using this syntax. It will read in the file as a single string.

open_file.py
with open('metamorphosis.txt', 'r') as in_file:
    lines = in_file.read()

print(type(lines))
print(lines)

File Path

Using the filename by itself as the first argument to open only works if the file is in the same directory as the .py file you are running.

Otherwise, you need to include the full file path.

Try making a new directory called text_files inside of where the .py file is and opening the file:

file_path.py
with open('text_files/metamorphosis.txt', 'r') as in_file:
    lines = in_file.read()

Because the file is read in as a single string, it includes the special “newline” character '\n', which creates a new line (similar to hitting the Enter or Return key on your keyboard).

You can turn this long string into a list of shorter strings, split by the newline characters, using <str>.split.

line_list = lines.split("\n")

Some examples of how this works:

Writing to a file is very similar

open_file.py
s = "some content to write to a file"
with open('output_file.txt', 'w') as out_file:
    print(s, file=out_file)
  • The second argument of open is 'w' for “writing”
    • To read, we used 'r' for “reading”
  • We use the print function with a keyword argument file=
    • Python “prints” the output to a file instead of the console

You should be able to open output_file.txt and read it with a text editor.

Overwriting Files

Be careful! When you write to a file with Python, you will overwrite any pre-existing file.

JSON

JavaScript Object Notation (“JSON”) is an industry-standard format for passing data between computer programs. It is similar to a Python dictionary, in that it consists of objects, which consist of pairs of names (keys) and values.

  • A JSON object is very similar to a Python dict
example0.json
{"Hello": 1, "World": 2, "Names (Keys) Must Be Strings": 3}

Whitespace is ignored in JSON (just like Python!) and JSON is commonly indented for readability:

example0.json
{
    "Hello": 1, 
    "World": 2, 
    "Names (Keys) Must Be Strings": 3
    }

Names must be strings, and unlike Python, strings are always delimited with double quotes ("). The values of an object can themselves be objects.

example1.json
{
    "First Outer Name": 1, 
    "Second Outer Name": 
    {
        "First Inner Name": 101,
        "Second Inner Name": 102
        }, 
    "Third Outer Name": 3
    }
  • Values can take on several types:
    • Strings
    • Numbers (JSON does not distinguish between ints and floats)
    • Bools (true and false – all lowercase, different from Python)
    • null – similar to Python’s None, indicating “nothing”
    • Arrays
    • Objects
  • A JSON array is very similar to a Python list. The example below has arrays as values:
example2.json
{
    "DC": [20052, 20051],
    "VA": [22101, 22102]
}
  • Arrays can contain any kind of value, including other arrays, and other objects:
example3.json
{
    "First": ["A", "B", [1, 2, 3]],
    "Second": [
        true, 
        null, 
        {
            "Inner Name": false,
            "Second Inner Name": "Yes"
        }, 
        "Thank you", 
        [5, 6]
        ],
    "Third": 1.5
}

The example above is unnecessarily complicated to show what JSON can do. In practice, JSON is usually formatted logically (it’s a data interchange format, after all). For further reading about JSON formatting, read about JSON Schema.

JSON in Python

Python has a built-in library for working with JSON: json. The json.loads function takes a string representing the contest of a JSON file, and converts it to a Python dictionary.

import json

with open("example2.json", "r") as f:
    contents = f.read()
print("Raw String:\n" + contents) # the raw string

json_contents = json.loads(contents)
print("Parsed JSON:\n" + str(json_contents)) # parsed into a Python dict
Raw String:
{
    "DC": [20052, 20051],
    "VA": [22101, 22102]
}
Parsed JSON:
{'DC': [20052, 20051], 'VA': [22101, 22102]}

We can also take a Python dict and parse it into valid JSON using json.dumps

D = {}
D["number"] = 12
D["bool"] = True
D["nothing"] = None
D["list"] = [4, "A"]
D[6] = 7
D[1.1] = 1
print("Original dict:\n", D)

json_out = json.dumps(D)
print("JSON:\n", json_out)

# "pretty print" indented JSON
pretty_json_out = json.dumps(D, indent=4)
print("Pretty JSON:\n", pretty_json_out)
Original dict:
 {'number': 12, 'bool': True, 'nothing': None, 'list': [4, 'A'], 6: 7, 1.1: 1}
JSON:
 {"number": 12, "bool": true, "nothing": null, "list": [4, "A"], "6": 7, "1.1": 1}
Pretty JSON:
 {
    "number": 12,
    "bool": true,
    "nothing": null,
    "list": [
        4,
        "A"
    ],
    "6": 7,
    "1.1": 1
}

Note the differences: most importantly, only strings can be names in JSON. In Python, any immutable type can be a dict key. For non-string keys in Python dict, json converts these to strings.

json.dumps returns a string, which is ready to be written to a file in the usual manner:

with open("output.json", "w") as f:
    print(pretty_json_out, file=f)

CSV

Another common file format for table data (such as spread sheets) is the Comma-Separated Value (CSV) format.

CSV files are, quite literally, values separated by commas. You can open them with any text editor:

species.csv
Common Name,Genus,Species,Status
Grackle,Quiscalus,quiscula,NT
Black Rat,Rattus,rattus,LC
Mallard Duck,Anas,platyrhynchos,LC
Canada Goose,Branta,canadensis,LC

While Python has a “built in” library for working with CSV files (it’s called csv), in practice, you will use the pandas library. This library is named after an adorable beast.

You will need to install the pandas library into your project. Using uv at the command line, run:

uv add pandas

We can then read in the CSV using pandas – the result is an object called a dataframe:

import pandas as pd
species = pd.read_csv("species.csv")
print(species)
    Common Name      Genus        Species Status
0       Grackle  Quiscalus       quiscula     NT
1     Black Rat     Rattus         rattus     LC
2  Mallard Duck       Anas  platyrhynchos     LC
3  Canada Goose     Branta     canadensis     LC
import pandas as pd

It is a convention to import pandas as pd. This is not necessary, we could just as readily have written:

import pandas
species = pandas.read_csv("species.csv")

Nonetheless, the Python data analysis community almost always uses pd, and almost all documentation you find for pandas will do this.

Pandas has a tremendous amount of functionality. Here is an example of adding a new column based on the contents of existing columns:

import pandas as pd
species = pd.read_csv("species.csv")
species['Least Concern'] = species['Status'] == "LC"
print(species)
    Common Name      Genus        Species Status  Least Concern
0       Grackle  Quiscalus       quiscula     NT          False
1     Black Rat     Rattus         rattus     LC           True
2  Mallard Duck       Anas  platyrhynchos     LC           True
3  Canada Goose     Branta     canadensis     LC           True

We won’t get into detailed Pandas functionality in this course. The Pandas documentation is very thorough, and a great place to go if you need to work with CSV data.

Command Line Arguments

We often want to write Python programs that work on files. When we call these programs from the command line, we want to be able to specify which files to work with. We use command line arguments to do this:

python smallest_value.py values.json
smallest_value.py
import argparse
import json

parser = argparse.ArgumentParser()
parser.add_argument("json_file")
args = parser.parse_args()

with open(args.json_file, "r") as f:
        json_content = json.loads(f.read())

print(json_content)

The syntax is very prescriptive, but simple if we are only parsing one argument:

Arguments are passed as strings into Python, so that what you type at the command line after the Python file name becomes a string you can use inside the program.

The argparse library also supports some very advanced functionality. We won’t use it here, but for further reading, look at the argparse documentation.

Keyword Arguments

Python functiond definitions support two types of arguments: regular (“positional”) arguments, which distinguish between arguments by position, and keyword arguments, that distinguish between arguments by name.

Keyword arguments have a position, and can be used positionally:

def f(x:int, y:int=2) -> str:
    return str(x) + str(y)

print(f(3, 4)) # keyword argument used positionally
34

Keyword arguments can be omitted, and the default value is used:

def f(x:int, y:int=2) -> str:
    return str(x) + str(y)

print(f(3)) # keyword argument omitted (default used)
32

Keyword arguments can also be called by keyword: if this is done, not all keyword arguments need to be specified.

def f(x:int, y:int=2, z:int=3) -> str:
    return str(x) + str(y) + str(z)

print(f(3,z=7)) # not all keyword arguments used
327

Keyword arguments must always follow positional arguments, both in function definitions, and in function calls.

Comprehensions

Comprehensions are Python expressions that concisely create lists and dictionaries. A comprehension nests a for expression inside of square brackets, creating the contents of a list using a loop.

As an example, suppose we have a list of strings, and we want to create a new list consisting of the strings from the original list that are entirely lower case.

Without a list comprehension, we would use a for loop:

L = ["Abacus", "calculator", "TI-89", "pen and paper", "Laptop", "computer"]
new_L = []
for l in L:
    if l == l.lower():
        new_L.append(l)
print(new_L)
['calculator', 'pen and paper', 'computer']

The comprehension saves us some keystrokes:

L = ["Abacus", "calculator", "TI-89", "pen and paper", "Laptop", "computer"]
new_L = [l for l in L if l == l.lower()]
print(new_L)
['calculator', 'pen and paper', 'computer']

The two programs above are equivalent. They do the same thing. You will never need to use a list comprehension, and comprehensions may make your code much harder for others (or you) to read.

Nonetheless, because they can often simplify a long program into a much shorter one, comprehensions are commonly used by Python programmers. Even if you don’t enjoy using comprehensions, chances are you will encounter quite a few of them in the wild.

Dictionary comprehensions exist as well. They work very similarly:

L = ["Abacus", "calculator", "TI-89", "pen and paper", "Laptop", "computer"]
D = {l[0]:l for l in L if l == l.lower()}
print(D)
{'c': 'computer', 'p': 'pen and paper'}

Review: what happened to calculator ?