How to Generate Fake Data With Python Mimesis

Generating fake data can be a task far beyond the infamous lorem ipsum. Mimesis, as describe by the library authors:

"Is a high-performance fake data generator for Python, which provides data for a variety of purposes in a variety of languages."

Let's install Mimesis and see how it plays:

pip install mimesis

Mimesis works through data-provider classes, like Person, which is capable of generating from blood type, to academic degree information:

from mimesis import Person


english_persona = Person('en')

print(
    english_persona.full_name(),
    english_persona.blood_type(),
    english_persona.occupation(),
    english_persona.academic_degree()
)

>>> Benedict Levy O− Security Officer Master

Locales

It also has impressive locale capabilities, supporting 33 different locales:


from mimesis import locales, Person

for locale in locales.SUPPORTED_LOCALES.keys():
    persona = Person(locale)
    print(f"{persona.locale}: {persona.full_name()}")


cs: Serafina Kopřiva
da: Alwigh Mouritzen
de: Albern Dirksen
de-at: Angelina Höhne
de-ch: Yassine Mooser
el: Μάγδα Ζάππα
en: Deloras Higgins
en-gb: Sina Hyde
en-au: Basil Duffy
en-ca: Geneviève Pugh
es: Albert Vila
es-mx: Alise Holguín
et: Sören Maradona
fa: پریچهر منوچهری
fi: Reetta Saari
fr: Eugénie Christophe
hu: Tara Pázmány
is: Thór Bjarnfinnsson
it: Beltrano Laguardia
ja: 武 向日
kk: Алтынгүл Кушекова
ko: 윤기 증
nl: Maxim Kasteen
nl-be: Noa Hendrickx
no: Askil Hustad
pl: Benedykt Domachowski
pt: Bartolomeu Filipe
pt-br: Franklim Louzada
ru: Каринэ Бородова
sk: Bohumil Kováč
sv: Kætilfridh Hedström
tr: Çisil Eronat
uk: Предислав Базилевич
zh: 翊萌 库

Country-specific Data

Mimesis supports country-specific data, like American SSNs and tracking numbers, or Brazilian CPFs or CNPJs, through built-in providers:

from mimesis.builtins import USASpecProvider
from mimesis.builtins import BrazilSpecProvider

us = USASpecProvider()
print(f"ssn:{us.ssn()}")
print(f"Tracking Number:{us.tracking_number()}")

print("---")

br = BrazilSpecProvider()
print(f"CPF:{br.cpf()}")
print(f"CNPJ:{br.cnpj()}")

>>> ssn:561-67-8858
>>> Tracking Number:SB 246 070 283 US
>>> ---
>>> CPF:194.331.779-83
>>> CNPJ:23.128.992/3866-50


If you don't find a custom provider that fits your needs, you can extend the BaseProvider and create your own: Creating a Custom Provider

Generating Data From a Schema

Fake data usually goes into fixtures or data files, like JSON, to be re-used and shared. That data must follow a certain pattern, and we can use Schemas and Fields to have them defined:

import json

from mimesis import Address
from mimesis.builtins import USASpecProvider
from mimesis.schema import Field, Schema


def user_description():
    f = Field('en', providers=[USASpecProvider])
    a = Address('en')
    return {
        'id': f('uuid'),
        'name': f('full_name'),
        'email': f('person.email'),
        'timestamp': f('timestamp', posix=False),
        'car_model': f('car'),
        'address': {
            'full_address': f('address'),
            'city': f('city'),
            'zip_code': f('zip_code')
        }
    }


schema = Schema(schema=user_description)
data = schema.create(iterations=2)

with open("user_fixture.json", "w") as fixture:
    json.dump(data, fixture)

The Scheme class takes in a callable object (like a function or lambda) describing the structure of the data with dictionaries, where the keys are the field names, and the values are instances of Field().

Field() accepts any parameter from the generic types, and you can mix and match everything as you wish. I used the Cryptographic, Person, Datetime, Car, and the Address data providers in the example above, all cleanly encapsulated by Field(). Check all the providers available.

In schema.create, the iterations parameter defines how many instances of that data you want to generate.

The output for the code above put in a json file will look like this:

user_fixture.json

[
  {
    "id": "d212255b-a3a8-4098-b21b-2987e0878af5",
    "name": "Herman Rasmussen",
    "email": "ontography1902@yandex.com",
    "timestamp": "2007-10-01T21:13:58Z",
    "car_model": "Lada Riva",
    "address": {
      "full_address": "918 Ralston Plaza",
      "city": "Goleta",
      "zip_code": "83107"
    }
  },
  {
    "id": "a4136dd2-82fa-4f02-9210-dfcea265dfc1",
    "name": "Danial Santana",
    "email": "unseduceability1825@yahoo.com",
    "timestamp": "2020-03-17T14:09:45Z",
    "car_model": "Mazda RX-7",
    "address": {
      "full_address": "918 Penny Expressway",
      "city": "Pine Bluff",
      "zip_code": "48681"
    }
  }
]