Data Analytics #08
Designing and Implementing Features
Hello!!
hope you enjoyed our previous newsletter of “Designing the Analytical Base Table” and how it helps in future of predictive data analytics. Let’s continue to our previous discussion with new topic “Designing and Implementing Features”.
2.4 Designing and Implementing Features
Once domain concepts have been agreed on, the next task is to design and implement concrete features based on these concepts.
What are Features?
Features are measurable data points we use to train machine learning models. They represent important ideas (domain concepts) from the real world.
For example, if the concept is "customer spending habits," a feature could be "average spending in the last 6 months."
Challenges in Creating Features:
Approximation: Sometimes, we can’t perfectly represent a Domain concept, so we use the best available data.
Proxy Features: If direct data isn’t available, we use related data instead (e.g., using customer reviews to estimate product satisfaction).
Concept Drop: If no data is available, we may need to skip that idea entirely.
Important Things to Consider:
Data Availability: Is the required data available? For example, if you need 6 months of account balances, the data must exist.
Timing: Can you get the data before making a prediction? For instance, attendance at a match can’t be used if you need predictions before the game starts.
Staying Useful Over Time: Will the feature remain relevant? Instead of using raw salary data (which changes over time), use a ratio like salary-to-loan amount.
Iterative Process:
Creating features isn’t a one-time job. We explore the data, design features, test them, and refine them repeatedly.
Types of Data
Numeric Data: True numeric values that allow arithmetic operations (e.g., price, age).
Interval Data: Values that allow ordering and subtraction, but do not allow other arithmetic operations (e.g., date, time).
Ordinal Data: Values that allow ordering but do not permit arithmetic (e.g., size measured as small, medium, or large).
Categorical Data: A finite set of values that cannot be ordered and allow no arithmetic (e.g., country, product type).
Binary Data: Two possible values (e.g., male, female).
Textual Data: Free text like names or addresses.
Simplified Categories:
Continuous Data: Includes numeric and interval data (good for math).
Categorical Data: Includes ordinal, binary, and text data (good for grouping or labeling).
Different Types of Features:
1. Raw Features
Definition: These are directly taken from the raw data without any modifications.
Examples:
Customer information: Age, gender, or loan amount.
Transaction details: Insurance claim type or payment amount.
2. Derived Features
Definition: These are created by processing raw data and combining multiple data points. They don’t exist directly in the raw data.
Examples:
Customer behavior: Average monthly purchases or loan-to-value ratios.
Usage changes: Increase in electricity bill payments over time.
Common Types of Derived Features
Aggregates:
Summarize multiple data points over a period or group.
Examples:
Total number of insurance claims a customer has made.
Average amount a customer spends at a store over 3 or 6 months.
Flags:
Binary features (1 or 0) to indicate the presence or absence of something.
Examples:
Flag indicating if a bank account has ever been overdrawn.
Flag showing if a customer missed a payment in the past 6 months.
Ratios:
Show the relationship between two values.
Examples:
Ratio of salary to loan amount for a loan application.
Ratio of data, SMS, and voice usage in a mobile plan.
Mappings:
Convert continuous features into categories for easier use.
Examples:
Map salaries into levels like low, medium, or high.
Convert electricity bill amounts into categories like low and high usage.
Other Creative Features:
Sometimes, innovative approaches are needed to capture hidden patterns.
Example:
A retailer used satellite images to count cars in a competitor's parking lot as a measure of the competitor's store activity.
Thank you for joining us! if you enjoyed this edition, consider giving it a like. We’d love to hear your thoughts-drop a comment below!
In next episode we will continue with this topic of Handling time, legal issues.



