What is the feasibility of purchasing a 1 crore INR house with a take-home salary of 1,65,000 INR per month and savings of 10-15 lakhs INR in hand?

October 01, 2023

Sorry to disappoint you but at this stage it seems to be a rather tall task. You should wait and invest/build your wealth. Start investing as much as you can in MF SIPs in good 5-star rated schemes. You can wait till the age of 35–36 also. By that time, you will have a significant kitty in your portfolio and may require very little loan. You will also know whether to buy a property in the current city.

If you wish to know the long reasons why, please read on.


Remember that most banks/HFCs finance only up to 85% of the construction cost for a new apartment. For an older/second-sale apartment, this reduces to 75%. Although this is not a hard-and-fast rule, it is better not to assume that more finance will be possible.

The question is 85% of what? What does it mean by “construction cost” or “base cost”?

The cost of the house is made up of base cost, GST, 1-time maintenance fee for initial maintenance (till society is formed), water and electricity connection charges, electricity deposit, registration fee, stamp duty, etc., etc. The base cost is (saleable area in sq ft * rate per sq ft). Now assuming that the base cost is ₹90 lakhs, and this property that you intend to purchase is a new property, 85% of the base cost is ₹ 76,50,000. This means that you must pay ₹ 23,50,000 from your own sources. Since you do not have this money, it means that you cannot buy this property at the present time.

There is also the fact that banks will let you borrow in such a way that your EMI is less than 40% of your take-home net pay per month (after taxes and all other borrowing/loans deducted.) If ₹1,65,000 is your CTC/month, then you would be taking home 72% of this pay or ₹ 1,18,800 (you know what it is). This means that (and assuming no other loans) you will be allowed an EMI of 40% of net pay or ₹47,520. If that is the EMI, your max borrowing at max loan tenure period is as follows:

This means even more bad news as you will have to cough up ₹38 lakhs from your own sources. At this stage, you do not have this money.


But just for fun, let me take the best-case and most wonderful scenario: the bank will finance you 100%, i.e., ₹ 90 lakhs, ₹10 lakhs from your own pocket.

At such high borrowing, the rate of interest will be high - 9% or even more. Rates are higher when borrowing is above ₹75 lakhs. Then there is also the case that at your age you are unlikely to have a great credit history, banks are wary of lending, they will look at your job/employer & how likely you will remain employed, etc. Here is a comparison chart:

Source:

Let me assume a rate of 9%. If this is the case, your EMI options look as follows:

Since most likely, you cannot afford high EMI, you will go for a 30-year loan. It is a bad situation:

  • You are at a very high risk - jobs are not stable these days
  • You will be repaying till the age of 56 assuming that you do not square off earlier
  • The bank will recover most of the interest in the earlier years; in fact at the end of 15 years (out of 30), the bank will have recovered 65.46% of the total interest payable while you would have paid only 20.67% of your principal. Your amortization looks like this:

(ignore the year-30 values, that is a round-off error).

  • What the above chart reveals is that after the mid-term point, you are mostly repaying the principal.
  • You will have paid nearly twice as much interest as your borrowing.
  • Yes, you can prepay and finish off earlier but can you? In my honest opinion, considering that you will have a near zero balance at the start, you will marry, start a family, and will have growing expenses, you are unlikely to think of pre-payment before year 10. If you look above, at that point 46% of the interest is already paid out.

Instead, if you save invest your ₹10 lakhs in an MF as lumpsum and start a ₹40,000 per month as an MF SIP, both assumed at a reasonable YoY annuity return of 12%, after 10 years, you will have

  • ₹ 31,05,850 from the lumpsum investment
  • ₹ 92,93,500 from the SIP investment
  • These are pre-tax amounts close to ₹1,24,00,000. At 10% LTCG, you will have ₹ 1.12 Cr with you.
  • Even if the home value appreciates in 10 years, you will require less financing at this point. (At 5% annual appreciation, the ₹1 Cr home will be ₹1.63 Cr at this stage).
  • You can easily get cheaper finance of ₹50 lakhs and finish the loan in 10 years:

(RoI is 8.5% today; most likely the rate will be around 7% then; could be less also.)

  • After 10 years, you will be earning much more.

(P/s. On many of my other answers on loan and repayment, I said that the bank recovers maximum interest in the first half of the tenure. Some commented that this is not true and if it is, it is only because the principal outstanding is higher initially and interest is charged on reducing balance. The latter part is true to an extent but that is not the only reason. If EMI was really “equated”, by mid term more than nearly 50% of principal should have been repaid. The bank is more interested in their earnings which comes from the interest. So they want to recover this as far as possible to minimise risk in case of default. I hope that my chart above clears this misconception.)


Quora Link: https://qr.ae/pKTwnH

Read More

statistics tutorial for Interview

September 26, 2023

Summary

📊 This text covers topics related to statistics, including descriptive and inferential statistics, sampling techniques, and definitions of population and sample.

Facts

  • The text discusses the difference between descriptive and inferential statistics.
  • Descriptive statistics involve organizing and summarizing data, including measures of central tendency and measures of dispersion.
  • Inferential statistics are used to form conclusions based on data and include tests like z-test, t-test, chi-square test, and ANOVA.
  • The concept of population (capital N) and sample (small n) is introduced.
  • Simple random sampling involves randomly selecting members of the population.
  • Stratified sampling divides the population into non-overlapping groups (strata) for sampling.
  • Examples of stratified sampling include gender-based and age-based sampling.- 📊 Overlapping professions can lead to stratified sampling.
  • 🎯 Stratified sampling involves dividing a population into different layers.
  • 🧑‍⚕️ Doctors and engineers may require different survey techniques.
  • 🧮 Systematic sampling involves selecting every nth individual from a population.
  • 🤔 Thanos may have used random sampling.
  • 🙋‍♂️ Convenient sampling involves surveying domain experts.
  • 🗳️ Exit polls typically use random sampling.
  • 🏦 RBI household surveys may use stratified random sampling or convenience sampling.
  • 💉 Drug testing may involve stratified or other sampling techniques based on the use case.
  • 📊 Variables can be quantitative (measured numerically) or qualitative (categorical).
  • 🔢 Quantitative variables can be discrete or continuous.
  • ⚖️ Continuous variables can have decimal values, while discrete variables have whole numbers.
  • 🧮 Nominal variables are categorical data with no inherent order (e.g., colors, gender).
  • 🥇 Ordinal data have an order but no meaningful numerical difference (e.g., ranks).
  • 🌡️ Interval data have an order, values matter, but a natural zero point is absent (e.g., temperature in Fahrenheit).
  • 📏 Interval data can be used for applications like ride-sharing services.🚖 Booking a cab for six hours with variable pricing. 📊 Frequency distribution for different flower types. 📈 Frequency distribution used for creating bar and pie charts. 🔄 Cumulative frequency for calculating total occurrences. 📊 Histograms for representing continuous data. 🔍 Kernel density estimator for smoothing histograms. 🧮 Central tendency includes mean, median, and mode. 📈 Mean is influenced by outliers. 📊 Outliers can significantly affect the mean. 🧮 Median is less affected by outliers. 📈 Mode identifies the most frequent value. 📊 Mode can handle multimodal distributions. 📈 Central element of sorted data for finding median. 📈 Median calculation for odd and even data points.- 😄 The text discusses the importance of using median with outliers in calculating central tendency.
  • 📊 It explains the use of mode in handling missing values for categorical variables.
  • 📈 Variance is discussed as a measure of dispersion, with an example calculation.
  • 📏 Standard deviation is introduced as the square root of variance and its significance in understanding data spread.
  • 📊 Percentiles are explained as a way to represent data in terms of percentages.
  • 🧐 The concept of quartiles is mentioned as a step towards finding outliers.📊 Distribution of Data
  • Percentiles: A value below which a certain percentage of observations lie (e.g., 80th percentile means 80% of data is below that value).
  • Calculation Example: Finding the percentile rank of 10 using the formula: Number of values below 10 / Sample size * 100.
  • Five Number Summary: Minimum, Q1 (1st Quartile), Median, Q3 (3rd Quartile), Maximum.
  • Box Plot: Visualization of the Five Number Summary, useful for identifying outliers.
  • Outlier Removal: Using Interquartile Range (IQR) and lower/upper fences to detect and remove outliers.
  • Variance: Formula for sample variance and its use in statistics.
  • Standard Deviation: Measure of data dispersion.
  • Histograms: Graphical representation of data distribution.
  • Probability Density Functions (PDFs): Describing how data is distributed.
  • Mean, Median, Mode: Measures of central tendency.
  • Python Programming: Practical implementation of statistics concepts.

📈 Distributions Covered:

  • Normal (Gaussian) Distribution
  • Standard Normal Distribution
  • Z-Scores
  • Log-Normal Distribution
  • Bernoulli Distribution
  • Binomial Distribution

📊 Data Visualization Tools:

  • Bar Plot
  • Violin Plot

The text discusses various statistical concepts and their practical applications, including data distribution visualization, outlier detection, and statistical measures, with an emphasis on using Python for implementation.- 📊 The text discusses the concept of distributions, particularly Gaussian or normal distributions.

  • 🛎️ A Gaussian distribution is characterized by a bell curve, with symmetrical sides.
  • 🧮 Standard deviation is discussed, and the text mentions the empirical rule (68-95-99.7).
  • 📈 Z-scores are introduced as a way to determine how many standard deviations a value is from the mean.
  • 📉 Standardization is explained as converting data to have a mean of 0 and a standard deviation of 1.
  • 🔄 Normalization is mentioned as a process to scale data between a specified range, such as 0 to 1.
  • 🖥️ Practical applications of standardization and normalization in machine learning are mentioned.- 💡 Explanation: The text discusses the concept of pixels and their normalization using min-max scaling and Z-scores.
  • 💻 Pixel Value Range: Each pixel in a 4x4 image has a value ranging from 0 to 255.
  • 📊 Min-Max Scaling: Min-max scaling is a method to convert pixel values between 0 and 1, where 0 corresponds to the minimum value (0) and 255 to 1.
  • 📈 Normalization: The text mentions that dividing each pixel value by 255 is another method of normalization, resulting in values between 0 and 1.
  • 🧮 Z-Score Calculation: The text introduces the Z-score formula (Z = (X - μ) / σ) and applies it to data from cricket matches in 2020 and 2021.
  • 🏏 Cricket Analysis: It discusses how Z-scores can be used to compare performance in cricket matches and how to interpret Z-score values.
  • 📊 Z-Table: The text briefly discusses how to use a Z-table to find the area under the normal distribution curve, indicating the percentage of scores falling above a certain threshold (4.25 in this example).🔍 In this text, the following points are discussed:
  • The importance of understanding the right table for obtaining specific information.
  • The absence of information in the right table.
  • The need to use the left table for certain information.
  • An example related to z-score standardization.
  • Calculating the z-score for a given IQ value.
  • Explaining the concept of standard deviation.
  • Identifying outliers using z-scores.
  • Implementing a function to detect outliers.

Please note that the text contains both technical information and tutorial-like explanations.📊 Data Analysis:

  • The speaker discusses data analysis, mentioning terms like "threshold," "standard deviation," "z score," and "outliers."

📈 Z Score Computation:

  • Z score computation is explained, including sorting data, calculating q1 and q3 percentiles, and finding outliers based on z scores.

📊 Interquartile Range (IQR):

  • The speaker discusses calculating the IQR, lower fence, and upper fence for outlier detection.

📉 Probability:

  • The concept of probability is introduced, emphasizing its importance in various fields like machine learning.

🔗 Probability Definition:

  • Probability is defined as the likelihood of an event occurring, with an example involving rolling dice and coin tossing.

📈 Addition Rule for Mutual Exclusive Events:

  • The addition rule for mutually exclusive events is explained, with examples of coin tossing and dice rolling.

📈 Addition Rule for Non-Mutual Exclusive Events:

  • The addition rule for non-mutually exclusive events is discussed, with an example involving drawing cards from a deck.

These topics cover discussions on data analysis, outlier detection, probability, and addition rules for both mutually exclusive and non-mutually exclusive events.- 🃏 There are 52 cards in a deck.

  • 🎴 Probability of getting a Queen: 4/52
  • ❤️ Probability of getting a Heart card: 13/52
  • 🃏❤️ Probability of getting a Queen and a Heart card: 1/52
  • 🧮 Addition Rule for Non-Mutually Exclusive Events:
    • Probability of Queen or Heart = Probability of Queen + Probability of Heart - Probability of Queen and Heart
    • (4/52) + (13/52) - (1/52) = 16/52
  • 🎲 Probability can be divided into Independent and Dependent Events:
    • Independent events are not influenced by previous events.
    • Dependent events are influenced by previous events.
  • 🎲 Independent events have equal probabilities for each outcome.
  • 🎲 Dependent events involve conditional probabilities.
  • 🎯 Permutation: Arranging objects with order matters.
    • Example: Arranging chocolates in a specific order.
    • Formula: nPr = n! / (n - r)! = 6P3 = 120
  • 🤝 Combination: Selecting objects where order doesn't matter.
    • Example: Selecting unique combinations of chocolates.
    • Formula: nCr = n! / (r! * (n - r)!) = 6C3 = 20
  • 📊 P-Value represents the probability of an event occurring.
    • Higher P-values indicate a higher probability of an event happening.
    • Lower P-values indicate a lower probability of an event happening.
    • P-Value of 0.8 means 80% probability of occurrence.
    • P-Value of 0.01 means 1% probability of occurrence.
    • P-Value helps assess the significance of results in statistical analysis.- 🧪 Hypothesis testing involves:
    • Combining topics such as confidence intervals and significance values.
    • Assessing if a coin is fair through experiments and probability.
    • Null and alternate hypotheses are defined in hypothesis testing.
    • Experiments are performed, and the null hypothesis is either accepted or rejected.
  • 📊 Confidence Intervals:
    • The confidence interval is defined using significance value (alpha).
    • It represents the range within which a result is considered acceptable.
    • A significance value of 0.05 corresponds to a 95% confidence interval.
  • 📉 Significance Value:
    • Significance value (alpha) determines the width of the confidence interval.
    • If the experiment falls within the interval, the null hypothesis is accepted.
    • If outside the interval, the null hypothesis is rejected.
  • 🧮 Type 1 and Type 2 Errors:
    • Type 1 error occurs when the null hypothesis is rejected when it is true.
    • Type 2 error occurs when the null hypothesis is accepted when it is false.
    • These errors are important in hypothesis testing and are part of a confusion matrix.📌 Type two error is also known as false negatives. 📌 There are four possible outcomes when evaluating hypotheses. 📌 Outcome four involves accepting the null hypothesis when it is true, which is a good scenario. 📌 Confusion matrices in real-world scenarios help define true positives, true negatives, false positives, and false negatives. 📌 Determining whether a false positive is type 1 or type 2 error depends on context. 📌 One-tailed and two-tailed tests are important concepts. 📌 In a one-tailed test, you focus on one direction (e.g., greater than), while in a two-tailed test, you consider both directions (e.g., greater than or less than). 📌 Confidence intervals help estimate population parameters. 📌 Point estimate is a value of a statistic estimating a parameter. 📌 Confidence intervals consist of a point estimate plus or minus a margin of error. 📌 When population standard deviation is known, a z-test is used to find the confidence interval. 📌 The formula for the confidence interval is Point Estimate ± Z(α/2) * (Standard Deviation / √Sample Size). 📌 This formula is typically used when the sample size is greater than or equal to 30 and population standard deviation is known. 📌 Sample size and population standard deviation influence the choice of formula for confidence intervals.📊 Summary of the Text:
  • The text discusses various statistical calculations and hypothesis testing procedures, particularly focusing on z-tests and confidence intervals.
  • It begins by explaining how to find the z-score using a z-table.
  • The text then provides an example of calculating a confidence interval with a given alpha level.
  • It delves into hypothesis testing, defining the null and alternate hypotheses and setting the alpha level.
  • The decision rule for a two-tailed test is explained, along with the calculation of test statistics using the z-test formula.
  • The text briefly mentions the importance of the standard error in larger sample sizes and hints at the central limit theorem.

Please note that this summary includes technical content from the text and may not be easily understandable without prior knowledge of statistics.📊 Chi-Square Test

Population Information (2000 Census):

  • Less than 18 years: 20%
  • 18 to 35 years: 30%
  • Greater than 35 years: 50%

Observed Distribution (2010 Sample, n = 500):

  • Less than 18 years: 121
  • 18 to 35 years: 288
  • Greater than 35 years: 91

Expected Distribution Based on 2010 Sample:

  • Less than 18 years: 100
  • 18 to 35 years: 150
  • Greater than 35 years: 250

Conclusion:

  • There is a significant difference between the expected and observed distributions.
  • Using alpha = 0.05, we conclude that the population distribution of ages has changed in the last 10 years.- 📊 The text discusses data analysis and hypothesis testing.
  • 📈 It mentions the importance of defining null and alternate hypotheses.
  • 📏 It specifies an alpha value of 0.05 for a 95% confidence interval.
  • 📊 The text explains the calculation of degrees of freedom for a chi-square test.
  • 📈 It discusses chi-square tests and decision boundaries.
  • 📉 It calculates the chi-square test statistic using observed and expected values.
  • 📊 The text mentions the significance level (alpha) and p-values in hypothesis testing.
  • 📈 It introduces covariance as a measure of the relationship between two variables.
  • 📉 It explains positive, negative, and zero covariance values.
  • 📊 The text highlights the limitation of covariance in not providing a fixed magnitude for correlation.
  • 📈 It hints at the need for a correlation coefficient like Pearson correlation to measure correlation strength.- 📊 The Pearson correlation coefficient restricts values between -1 and +1.
  • 🧮 It measures the degree of correlation between two variables.
  • ➡️ A positive correlation (towards +1) means variables move together.
  • ⬅️ A negative correlation (towards -1) means variables move opposite.
  • ✍️ Formula: Pearson correlation = Covariance(X, Y) / (Std Dev(X) * Std Dev(Y))
  • 📈 Correlation values range from -1 to +1.
  • 📉 Negative correlation when X decreases, Y increases.
  • 📊 Positive correlation when X increases, Y increases.
  • ⚖️ Values on a straight line have a correlation of -1 or +1.
  • 🧐 Non-linear properties better captured by Spearman rank correlation.
  • 📝 Spearman formula: Covariance(rank(X), rank(Y)) / (Std Dev(rank(X)) * Std Dev(rank(Y)))
  • 📊 Spearman rank correlation captures non-linear relationships.
  • 🧑‍🎓 Understanding rank: Assign ranks to data points to compute Spearman correlation.
  • 📊 T-test used to compare sample mean with population mean.
  • 📊 If p-value < 0.05, reject the null hypothesis.
  • 📊 Visualization tools like pair plots and correlation matrices help analyze correlations.
  • 📊 Correlation can be positive (variables move together) or negative (variables move oppositely).
  • 📊 Spearman rank correlation is used when non-linear relationships are expected.📊 Summary of the text:
  • 💡 Explains the significance of p-values in statistical testing.
  • 🧪 Discusses how p-values relate to null hypothesis testing.
  • 📝 Provides an example of a z-test problem with calculations.
  • 📈 Demonstrates the calculation of p-values based on z-scores.
  • 🤝 Emphasizes the importance of comparing p-values to significance levels (alpha) for hypothesis testing.
  • 📚 Mentions topics to be covered in future sessions, including distributions, central limit theorem, and F-tests.
  • 🔀 Describes the process of rejecting or failing to reject the null hypothesis based on p-values and significance levels.- 🔍 The problem involves hypothesis testing and statistical analysis.
  • 📊 Average age of a college is 24 years with a standard deviation of 1.5.
  • 🧪 A sample of 35 (or 36) students is taken.
  • 📈 The sample mean age is 25 years.
  • 📊 Hypotheses:
    • H0 (null hypothesis): Mean age = 24 years.
    • H1 (alternative hypothesis): Mean age ≠ 24 years.
  • 🧮 Standard deviation (σ) is 1.5, sample size (n) is 36, and sample mean (x̄) is 25.
  • 📝 Significance level (alpha) is 0.05.
  • 🧮 It's a two-tailed test.
  • 📉 Calculate the z-score: (25 - 24) / (1.5 / √36) = 1.2
  • 📈 Decision boundary is ±1.96 for a 95% confidence interval.
  • 🚫 1.2 < 1.96, so reject the null hypothesis.
  • 📊 Calculate the p-value: p ≈ 0.403.
  • 🚫 The p-value is less than alpha (0.05), so reject the null hypothesis.
  • 📊 Discusses various probability distributions: Bernoulli, Binomial, and Pareto.
  • 📉 Explains the relationship between power law (Pareto) and log-normal distributions.
  • 📦 Mentions data transformations for normalizing distributions, including Box-Cox transformation.
  • 📄 Central Limit Theorem is briefly mentioned as applicable to various distributions.
  • 📊 When taking multiple samples (n ≥ 30) from data, they tend to follow a normal distribution due to the Central Limit Theorem.
  • 📈 The more samples (m), the better the Central Limit Theorem applies.
  • 🧮 Sample size (n ≥ 30) and the number of samples (m) are crucial for the Central Limit Theorem.
  • 📈 Populating all sample means results in a normal distribution, regardless of the original data distribution.
  • 📊 Some data distributions mentioned include normal, Poisson, and Pareto distributions.

Read More