mb.get_dataset(dataset_name, ...)
Fetches a dataset and returns it as a pandas DataFrame. Datasets can optionally be filtered and used as feature stores.
Parameters
dataset_name
:str
The name of the dataset.filters
:Optional[Dict[str, ...]]
If supplied with afilters
dict, the DataFrame returned will be filtered to rows matching the filter criteria. See the next section for the formats that the filters can take.
Filter syntax
- Single-value equivalence:
{ "my_column": 4 }
. Returns all rows wheremy_column=4
- Multiple-value equivalence:
{ "my_column": ["a", "b", "c"] }
. Returns all rows wheremy_column
is either"a"
,"b"
, or"c"
- Greater than and less than:
{ "my_column": { "<": 4 }}
. Returns all rows wheremy_column < 4
. Available operators:<
: Less than<=
: Less than or equal to>
: Greater than>=
: Greater than or equal to=
: Equals
Returns
pandas.DataFrame
Examples
Get all rows in a dataset
Returns all rows in the customer_features
dataset.
similar_customers = mb.get_dataset("customer_features")
Get rows matching a certain value
Returns the row(s) in customer_features
where the CUSTOMER_ID=52
.
similar_customers = mb.get_dataset("customer_features", filters={"CUSTOMER_ID": 52 })
Get specific rows
Returns all rows in customer_features
where the REGION
column is either NA
or SA
and the EMPLOYEE_COUNT
column is either 100-500
or 500-5000
.
similar_customers = mb.get_dataset(
"customer_features",
filters={
"REGION": ["NA", "SA"]
"EMPLOYEE_COUNT": ["100-500","500-5000"]
}
)
Get rows greater or less than a value
Returns events where DWELL_TIME
is greater or less than a certain value.
# using greater than
events = mb.get_dataset("website_events", filters={ "DWELL_TIME": { ">": 5 } } )
# using less than or equal to
events = mb.get_dataset("website_events", filters={ "DWELL_TIME": { "<=": 100 } } )
Get rows between a range of values
Returns events with DWELL_TIME
between 5
and 100
:
events = mb.get_dataset("website_events", filters={ "DWELL_TIME": { ">": 5, "<=": 100 } } )
See also
- Read the Datasets section of the docs for more info on using datasets as feature stores.