get_dataset
Fetch a dataset and return it as a Pandas DataFrame. Datasets can be filtered and used as feature stores.
Parameters
mb.get_dataset({dataset_name}, ...)
dataset_name
:str
The name of the dataset.filters
:Optional[Dict[str, ...]]
If supplied with afilters
dict, the DataFrame returned will be filtered to rows matching the filter criteria. See the next section for the formats that the filters can take.
Filter syntax
Dataset filter syntax supports several condition formats:
- Single-value equivalence:
{ "my_column": 4 }
. Returns all rows wheremy_column=4
- Multiple-value equivalence:
{ "my_column": ["a", "b", "c"] }
. Returns all rows wheremy_column
is either"a"
,"b"
, or"c"
- Greater than and less than:
{ "my_column": { "<": 4 }}
. Returns all rows wheremy_column < 4
. Available operators:<
: Less than<=
: Less than or equal to>
: Greater than>=
: Greater than or equal to=
: Equals
Returns
pandas.DataFrame
Examples
Get all rows in a dataset
Returns all rows in the customer_features
dataset.
similar_customers = mb.get_dataset("customer_features")
Get rows matching a certain value
Returns the row(s) in customer_features
where the CUSTOMER_ID=52
.
similar_customers = mb.get_dataset("customer_features", filters={"CUSTOMER_ID": 52 })
Get specific rows
Returns all rows in customer_features
where the REGION
column is either NA
or SA
and the EMPLOYEE_COUNT
column is either 100-500
or 500-5000
.
similar_customers = mb.get_dataset(
"customer_features",
filters={
"REGION": ["NA", "SA"]
"EMPLOYEE_COUNT": ["100-500","500-5000"]
}
)
Get rows greater or less than a value
Returns events where DWELL_TIME
is greater or less than a certain value.
# using greater than
events = mb.get_dataset("website_events", filters={ "DWELL_TIME": { ">": 5 } } )
# using less than or equal to
events = mb.get_dataset("website_events", filters={ "DWELL_TIME": { "<=": 100 } } )
Get rows between a range of values
Returns events with DWELL_TIME
between 5
and 100
:
events = mb.get_dataset("website_events", filters={ "DWELL_TIME": { ">": 5, "<=": 100 } } )
See also
- Read the Datasets section for more info on using datasets as feature stores.