Clean Data

PreprocessResource.iClean(request, sub_analysis_id, **kwargs)

Prior mandatory Steps: 1) Upload dataset 2) Create Analysis 3) Create sub analysis

At the time of the dataset upload, Intuceo performs some checks into the data and performs cleaning process. This cleaning process involves replacing any special characters with underscore(_) followed by short name (e.g. usd) and another underscore in data. Currency symbols and comma separators are removed in numeric attributes values. Special characters are any thing other than alpha-numerics except underscore which is retained.

Column name cleansing:

  1. Names are lower cased.
  2. Replace non-alpha-numeric characters into special identifiers listed below.

Arguments

sub_analysis_id Give sub analysis id

Possible errors

Error message
Invalid sub analysis id

GET Request Example

curl -u username:password {url_prefix}/iclean/{sub_analysis_id}/

Response Example

{
    "error": false,
    "error_msg": "",
    "result": {
        "missingdata_greater_than_ten_catg": [],
        "missingdata_greater_than_ten_num": [],
        "number_of_consi_categorical_attributes": 10,
        "number_of_consi_numeric_attributes": 7,
        "number_of_considered_attributes": 17,
        "number_of_records_removed": 0,
        "removed_attributes": [],
        "categorical_uni_attr": {},
        "date_text_id_attrs": [],
        "number_of_categorical_attrs": 0,
        "number_of_numeric_attrs": 0,
        "numeric_uni_attr": {},
        "orig_colNames_dict": {
            "age": "age",
            "balance": "balance",
            "campaign": "campaign",
            "contact": "contact",
            "day": "day",
            "default": "default",
            "duration": "duration",
            "education": "education",
            "housing": "housing",
            "job": "job",
            "loan": "loan",
            "marital": "marital",
            "month": "month",
            "pdays": "pdays",
            "poutcome": "poutcome",
            "previous": "previous",
            "y": "y"
        },
        "percent_missing_cell": 0.0,
        "target_type_vlaues": {
            "no": "4000",
            "yes": "521"
        },
        "total_attributes": 17,
        "total_categorical_attrs": 10,
        "total_missing_cell_count": 0,
        "total_numeric_attrs": 7,
        "total_records": 4521,
        "user_defined_missing_percentage_value": 10
    }
}