EDA FOR TITANIC DATASET:

Vidya sri.k
3 min readNov 5, 2020

--

  • What is EDA?

Exploratory data analysis[EDA] is a method used to analyze and summarize datasets. Majority of the EDA techniques involves the use of graphs.

  • Titanic dataset:

It is one of the most popular datasets used for understanding machine learning basics. It contains information of all the passengers aboard the RMS titanic, which unfortunately was shipwrecked. This dataset can be use to predict whether a given passengers survived for not.

Some Important things here to know.

  • The Data set we have consists 12 features But we are doing Uni-variate analysis only on 8 features because rest 4 features are useless OS not give much information to us. for eg. feature Passenger Id and Passenger name, tickets these to features are not useful for prediction so its better to drop theme from dataset. And in the cabin data set we have 77% null values so it is not trivial to handle cabin feature that’s why i am drooping these 4 features from my data set.
# You can drop those 4 features from data set with this code.
data = data.drop(["PassengerId", "Name" , "Ticket" , "Cabin"],axis = 1)
data.head()

Output of the above code shell.

  • SIBSP Feature:
sns.countplot(data["SibSp"],hue = data["Survived"],data = data)

Observations-

  1. Passengers who have 0 siblings mostly of them died.
  2. Passengers with 1 sibling has equally chance for both die and survive.
  3. But wait its interesting. passengers who have higher number of siblings like 3 , 4 , 5, 8 has a very low chance to survive or almost 0% chance to survive. But it should be high right.. because if I were on the titanic and have 4 , 5 or 8 siblings on titanic then chances of my survival should be high right. But sadly in hard times people thinks only for themselves not for others.
  • CONCLUSION:

From my exploratary analysis of titanic dataset we conclude moment had higher chances of survival. We can do a t test to come up with chances of survival. I also see that class of the passengers had played a role in their survival.

there were some limitation for this dataset such as missing values for some attributes of passengers.

--

--

Vidya sri.k
Vidya sri.k

No responses yet