The easiest way to install Carmen is with the built-in setup script:
$ python setup.py install
This installs the carmen package and associated data files into the active Python environment.
Carmen comes with a simple frontend to demonstrate its capabilities. Once Carmen is installed, you can run the frontend with:
$ python -m carmen.cli [options] [input_file] [output_file]
The input file should contain one JSON-serialized tweet per line, as returned by the Twitter API. If it is not specified, standard input is assumed. Carmen will output these tweets as JSON, with location information added in the location key, to the given output file, or standard output if none is specified. Both the input and output filenames may end in .gz to specify that Carmen should treat the files as gzipped text.
If the -s (--statistics) option is passed, Carmen will print summary statistics when it finishes processing, detailing the number of tweets that were successfully resolved, and the resolution methods that were used to do so. For information on other options, use the -h (--help) option.
Python applications can use the Carmen API to directly retrieve location information for tweets:
import json
import carmen
tweet = json.loads(tweet_json)
resolver = carmen.get_resolver()
resolver.load_locations()
location = resolver.resolve_tweet(tweet)
The resolver’s resolve_tweet() method is the central API call:
Find the best known location for the given tweet, which is provided as a deserialized JSON object, and return a tuple containing two elements: a boolean indicating whether the resolution is provisional, and a Location object. Provisional resolutions may be overridden by non-provisional resolutions returned by a less preferred resolver (i.e., one that comes later in the resolver order), and should be used when returning locations with low confidence, such as those found by using larger “backed-off” administrative units.
If no suitable locations are found, None may be returned.
Contains information about a location and how it was identified.
Basic location information. A value of None for a particular field indicates that it does not apply for that specific location.
An iterable containing alternative names for this location.
The name of the method used to resolve this location’s data from the tweet that originally contained it.
True if this location appears in the database, False otherwise.
For known locations, the database ID. For other locations, a unique ID is arbitrarily assigned for each run.
The resolver’s default location database can be added to or overridden using its add_location() and load_locations() methods:
Add an individual Location object to this resolver’s set of known locations.
Load locations into this resolver from the given location_file, which should contain one JSON object per line representing a location. If location_file is not specified, an internal location database is used.
Finally, the behavior of the resolver itself can be customized:
Return a location resolver. The order argument, if given, should be a list of resolver names; results from resolvers named earlier in the list are preferred over later ones. For a list of built-in resolver names, see Built-in resolvers. The options argument can be used to pass configuration options to individual resolvers, in the form of a dictionary mapping resolver names to keyword arguments:
{'geocode': {'max_distance': 50}}
The modules argument can be used to specify a list of additional modules to look for resolvers in. See Extending Carmen for details.