Assignment of different substring variants to defined table columns with python

General discussion for off-topic subjects.
mySR
Junior Member
Posts: 2
Joined: Wed Nov 18, 2020 8:55 am

Assignment of different substring variants to defined table columns with python

Postby mySR » Wed Nov 18, 2020 9:08 am

Hello,

I have a task that pushes me to my limit.

I hope that you can help me a little further :-)

Any help and feedback to the path is appreciated

Situation:

I get answers to questions that are half single choice questions. The other half are numerical questions.

In this case:

Questions 1, 3 and 5: single choice question with
4 possible answer options: e.g. 1a / 1b / 1c / 1d

Question 2 and 4:
numerical questions.
Answer options: e.g. (any value)


I have very limited influence on the form/strucure of the answer. As a result, there are different variants of the answer.

Here is an example:

I get the following data as a CSV

Code: Select all

      <bound method NDFrame.head of       Name      Value             Date_Time
    0   User 1         5c  2020-11-08T23:38:23Z
    1   User 1       1000  2020-11-08T23:34:23Z
    2   User 1         3B  2020-11-08T23:32:23Z
    3   User 1         90  2020-11-08T23:30:23Z
    4   User 1         1B  2020-11-08T23:28:23Z
    5   User 2      150kg  2020-11-08T22:34:23Z
    6   User 2         3B  2020-11-08T22:32:23Z
    7   User 2    $80 USD  2020-11-08T22:30:23Z
    8   User 3         5D  2020-11-08T21:34:23Z
    9   User 3        300  2020-11-08T21:32:23Z
    10  User 3       80 $  2020-11-08T21:30:23Z
    11  User 3         1B  2020-11-08T21:28:23Z
    12  User 4       1: A  2020-11-08T20:36:23Z
    13  User 4     2 - 17  2020-11-08T20:34:23Z
    14  User 4        3:A  2020-11-08T20:32:23Z
    15  User 4    4 - 326  2020-11-08T20:30:23Z
    16  User 4  5 - 25000  2020-11-08T20:28:23Z>


With pivot I can group the data to one string for each user.


Code: Select all

    df=comments[['Name', 'Value']]
    table = pd.pivot_table(df, index='Name', values='Value', aggfunc=lambda x: ' '.join(x))
    print(table)



Code: Select all

    <bound method NDFrame.head of     Name,Value
    0  User 1,5c 1000 3B 90 1B
    1  User 2,150kg 3B $80 USD
    2  User 3,5D 300 80 $ 1B
    3  User 4,1: A 2 - 17 3:A 4 - 326 5 - 25000>


The first goal is to create a target table in which each substring is correctly assigned to the name and the answer no. / (each column)

I can't get any further at this point. Considerations with dictionary or stringio did not lead to a solution.
I hope you have some ideas for me :-)

This is how the table should look like

Target Table 1 (Substring to answer no. in column)

Code: Select all

    Name       Answer 1 Answer 2  Answer 3 Answer 4  Answer 5
    User 1     1B       90        3B       1000      5c
    User 2     NaN      $80 USD   3A       150kg     NaN
    User 3     1B       80 $      NaN      300       5D
    User 4     1: A     2 - 17    3:A      4 - 326   5:B



I think the following criteria can be helpful:

    - the number in front of a letter indicates the answer no (answer 1 3 5)
    - the letters are always a/b/c/d
    - the numerical answers are between answer 1 & 3 / 3 & 5 (and its criteria)
    - attention: no value can also be in the table (User 2 - answer 1 & 5 / User 3 - answer 3)

I guess it's difficult to meet all requirements.
But If we find a 80% solution for this problem I will be very happy. :-)


That would help me to get to the next step:

The second goal is to modify the substrings in the target table in such a way that you only have unique standard values ​​without special characters or anything else.

    Letters only for single choice questions.
    Numeric values ​​(without $ or kg etc.) for numeric questions

Target Table 2 (modified to unique standard values)


Code: Select all

    Name       Answer 1 Answer 2  Answer 3 Answer 4  Answer 5
    User 1     B        90        B        1000      C
    User 2     0        80        A        150       0
    User 3     B        80        0        300       D
    User 4     A        17        A        326       B
mySR
Junior Member
Posts: 2
Joined: Wed Nov 18, 2020 8:55 am

Re: Assignment of different substring variants to defined table columns with python

Postby mySR » Mon Nov 23, 2020 7:15 am

Hello dear community,

Since I have unfortunately not yet received an answer to this question, I would like to ask what the reason could be.

Could the question be unclear or not well enough asked?

Or is the task so complicated that a good programmer needs a lot of time for it?

That would be nice to know. :smile:

Thank you in advance
User avatar
Ayuto
Project Leader
Posts: 2195
Joined: Sat Jul 07, 2012 8:17 am
Location: Germany

Re: Assignment of different substring variants to defined table columns with python

Postby Ayuto » Mon Nov 23, 2020 7:34 pm


Return to “Whatever”

Who is online

Users browsing this forum: No registered users and 15 guests