How to Generate Fake Data With Python Mimesis
Generating fake data can be a task far beyond the infamous lorem ipsum. Mimesis, as describe by the library authors:
"Is a high-performance fake data generator for Python, which provides data for a variety of purposes in a variety of languages."
Let's install Mimesis and see how it plays:
pip install mimesis
Mimesis works through data-provider classes, like Person
, which is capable of generating from blood type, to academic degree information:
from mimesis import Person
english_persona = Person('en')
print(
english_persona.full_name(),
english_persona.blood_type(),
english_persona.occupation(),
english_persona.academic_degree()
)
>>> Benedict Levy O− Security Officer Master
Locales
It also has impressive locale capabilities, supporting 33 different locales:
from mimesis import locales, Person
for locale in locales.SUPPORTED_LOCALES.keys():
persona = Person(locale)
print(f"{persona.locale}: {persona.full_name()}")
cs: Serafina Kopřiva
da: Alwigh Mouritzen
de: Albern Dirksen
de-at: Angelina Höhne
de-ch: Yassine Mooser
el: Μάγδα Ζάππα
en: Deloras Higgins
en-gb: Sina Hyde
en-au: Basil Duffy
en-ca: Geneviève Pugh
es: Albert Vila
es-mx: Alise Holguín
et: Sören Maradona
fa: پریچهر منوچهری
fi: Reetta Saari
fr: Eugénie Christophe
hu: Tara Pázmány
is: Thór Bjarnfinnsson
it: Beltrano Laguardia
ja: 武 向日
kk: Алтынгүл Кушекова
ko: 윤기 증
nl: Maxim Kasteen
nl-be: Noa Hendrickx
no: Askil Hustad
pl: Benedykt Domachowski
pt: Bartolomeu Filipe
pt-br: Franklim Louzada
ru: Каринэ Бородова
sk: Bohumil Kováč
sv: Kætilfridh Hedström
tr: Çisil Eronat
uk: Предислав Базилевич
zh: 翊萌 库
Country-specific Data
Mimesis supports country-specific data, like American SSNs and tracking numbers, or Brazilian CPFs or CNPJs, through built-in providers:
from mimesis.builtins import USASpecProvider
from mimesis.builtins import BrazilSpecProvider
us = USASpecProvider()
print(f"ssn:{us.ssn()}")
print(f"Tracking Number:{us.tracking_number()}")
print("---")
br = BrazilSpecProvider()
print(f"CPF:{br.cpf()}")
print(f"CNPJ:{br.cnpj()}")
>>> ssn:561-67-8858
>>> Tracking Number:SB 246 070 283 US
>>> ---
>>> CPF:194.331.779-83
>>> CNPJ:23.128.992/3866-50
If you don't find a custom provider that fits your needs, you can extend the BaseProvider
and create your own: Creating a Custom Provider
Generating Data From a Schema
Fake data usually goes into fixtures or data files, like JSON, to be re-used and shared. That data must follow a certain pattern, and we can use Schemas and Fields to have them defined:
import json
from mimesis import Address
from mimesis.builtins import USASpecProvider
from mimesis.schema import Field, Schema
def user_description():
f = Field('en', providers=[USASpecProvider])
a = Address('en')
return {
'id': f('uuid'),
'name': f('full_name'),
'email': f('person.email'),
'timestamp': f('timestamp', posix=False),
'car_model': f('car'),
'address': {
'full_address': f('address'),
'city': f('city'),
'zip_code': f('zip_code')
}
}
schema = Schema(schema=user_description)
data = schema.create(iterations=2)
with open("user_fixture.json", "w") as fixture:
json.dump(data, fixture)
The Scheme class takes in a callable object (like a function or lambda) describing the structure of the data with dictionaries, where the keys are the field names, and the values are instances of Field()
.
Field()
accepts any parameter from the generic types, and you can mix and match everything as you wish. I used the Cryptographic, Person, Datetime, Car, and the Address data providers in the example above, all cleanly encapsulated by Field()
. Check all the providers available.
In schema.create
, the iterations
parameter defines how many instances of that data you want to generate.
The output for the code above put in a json file will look like this:
user_fixture.json
[
{
"id": "d212255b-a3a8-4098-b21b-2987e0878af5",
"name": "Herman Rasmussen",
"email": "ontography1902@yandex.com",
"timestamp": "2007-10-01T21:13:58Z",
"car_model": "Lada Riva",
"address": {
"full_address": "918 Ralston Plaza",
"city": "Goleta",
"zip_code": "83107"
}
},
{
"id": "a4136dd2-82fa-4f02-9210-dfcea265dfc1",
"name": "Danial Santana",
"email": "unseduceability1825@yahoo.com",
"timestamp": "2020-03-17T14:09:45Z",
"car_model": "Mazda RX-7",
"address": {
"full_address": "918 Penny Expressway",
"city": "Pine Bluff",
"zip_code": "48681"
}
}
]