Discover how you are spending time by parsing your calendar with Python
in Jupyter.
![Calendar close up snapshot][1]
[Python][2] has incredibly scalable options for exploring data. With [Pandas][3] or [Dask][4], you can scale [Jupyter][5] up to big data. But what about small data? Personal data? Private data?
JupyterLaband Jupyter Notebookprovide a great environment to scrutinize my laptop-based life.
My exploration is powered by the fact that almost every service I use has a web application programming interface (API). I use many such services: a to-do list, a time tracker, a habit tracker, and more. But there is one that almost everyone uses: _a calendar_. The same ideas can be applied to other services, but calendars have one cool feature: an open standard that almost all web calendars support: `CalDAV`.
### Parsing your calendar with Python in Jupyter
Most calendars provide a way to export into the `CalDAV` format. You may need some authentication for accessing this private data. Following your service's instructions should do the trick. How you get the credentials depends on your service, but eventually, you should be able to store them in a file. I storemine in my root directory in a file called `.caldav`:
```
import os
with open(os.path.expanduser("~/.caldav")) as fpin:
username, password = fpin.read().split()
```
Never put usernames and passwords directly in notebooks! They could easily leak with a stray `git push`.
The next step is to use the convenient PyPI [caldav][6] library. I looked up the CalDAV server for my email service (yours may be different):
CalDAV has a concept called the `principal`. It is not important to get into right now, except to know it's the thing you use to access the calendars:
```
principal = client.principal()
calendars = principal.calendars()
```
Calendars are, literally, all about time. Before accessing events, you need to decide on a time range. One week should be a good default:
```
from dateutil import tz
import datetime
now = datetime.datetime.now(tz.tzutc())
since = now - datetime.timedelta(days=7)
```
Most people use more than one calendar, and most people want all their events together. The `itertools.chain.from_iterable` makes this straightforward: ` `
Reading all the events into memory is important, and doing so in the API's raw, native format is an important practice. This means that when fine-tuning the parsing, analyzing, and displaying code, there is no need to go back to the API service to refresh the data.
But "raw" is not an understatement. The events come through as strings in a specific format:
There is still some work to do to convert itto a reasonable Python object. The first step is to _have_ a reasonable Python object. The [attrs][8] library provides a nice start:
```
import attr
from __future__ import annotations
@attr.s(auto_attribs=True, frozen=True)
class Event:
start: datetime.datetime
end: datetime.datetime
timezone: Any
summary: str
```
Time to write the conversion code!
The first abstraction gets the value from the parsed dictionary without all the decorations:
Calendar events always have a start, but they sometimes have an "end" and sometimes a "duration." Some careful parsing logic can harmonize both into the same Python objects:
Now that the events are real Python objects, they really should have some additional information. Luckily, it is possible to add methods retroactively to classes.
But figuring which _day_ an event happens is not that obvious. You need the day in the _local_ timezone:
Events are always represented internally as start/end, but knowing the duration is a useful property. Duration can also be added to the existing class: